[HN Gopher] Korvus: Single-Query RAG with Postgres
___________________________________________________________________
Korvus: Single-Query RAG with Postgres
Author : levkk
Score : 112 points
Date : 2024-07-11 16:35 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| levkk wrote:
| Hey fellow open-source enthusiasts,
|
| We built Korvus, an open-source RAG (Retrieval-Augmented
| Generation) pipeline that consolidates the entire RAG workflow -
| from embedding generation to text generation - into a single SQL
| query, significantly reducing architectural complexity and
| latency.
|
| Here's some of the highlights:
|
| - Full RAG pipeline (embedding generation, vector search,
| reranking, and text generation) in one SQL query
|
| - SDKs for Python, JavaScript, and Rust (more languages planned)
|
| - Built on PostgreSQL, leveraging pgvector and pgml
|
| - Open-source, with support for open models
|
| - Designed for high performance and scalability
|
| Korvus utilizes Postgres' advanced features to perform complex
| RAG operations natively within the database. We're also the
| developers of PostgresML, so we're big advocates of in-database
| machine learning. This approach eliminates the need for external
| services and API calls, potentially reducing latency by orders of
| magnitude compared to traditional microservice architectures.
| It's how our founding team built and scaled the ML platform at
| Instacart.
|
| We're eager to get feedback from the community and welcome
| contributions. Check out our GitHub repo for more details, and
| feel free to hit us up in our Discord!
| kaspermarstal wrote:
| Very cool! A assume you use Postgres' native full-text search
| capabilities? Any plans for BM25 or similar? This would make
| Korvus the end-game for open source rag IMO.
| mdaniel wrote:
| I find it misleading to use an f-string containing encoded
| `{CONTEXT}` <https://github.com/postgresml/korvus/blob/bce269a2
| 0a1dbea933...>, and after digging into TFM
| <https://postgresml.org/docs/open-
| source/korvus/guides/rag#si...> it seems it is not, in fact, an
| f-string artifact but rather the literal characters
| "{"+"CONTEXT"+"}" and are the same in all the language
| bindings?
|
| IMHO it would be much clearer if you just used the normal %s
| for the "outer" string and left the _implicit_ f-string syntax
| as it is, e.g. {
| "role": "user", # this is not an
| f-string, is rather replaced by TODO FIXME
| "content": "Given the context\n:{CONTEXT}\nAnswer the question:
| %s" % query, },
|
| The way the example (in both the readme and the docs) is
| written, it seems to imply I can put my own fileds as siblings
| to the chat key and they, too, will be resolved
| results = await collection.rag( {
| "EXAMPLE": { "uh-huh": True
| }, "CONTEXT": {
| "vector_search": { "query": {
| "fields": {"text": {"query": query}},
| }, "document": {"keys": ["id"]},
| "limit": 1, },
| "aggregate": {"join": "\n"}, },
| "chat": { "messages": [{"content": "Given
| Context:\n{CONTEXT}\nAn Example:\n{EXAMPLE}" }
|
| One could not fault the user for thinking such a thing since
| the *API* docs say "see the *GUIDE*" :-(
| https://postgresml.org/docs/open-source/korvus/api/collectio...
| smarvin2 wrote:
| This section of the docs may be confusing. What you described
| will actually almost work. See:
| https://postgresml.org/docs/open-
| source/korvus/guides/rag#ra...
| hahahacorn wrote:
| Very cool. I see more languages planned in your comment. Are you
| looking for community help developing SDKs in other languages?
| After spending an entire Saturday running a RAG pipeline for a
| POC for a "fun" side project, I definitely would've loved to have
| been able to use this instead.
|
| I spent too long reading Python docs because I haven't touched
| the language since 2019. Happy to help develop a Ruby SDK!
| pqdbr wrote:
| +1 for a Ruby SDK!
| smarvin2 wrote:
| We would love help developing a Ruby SDK! We programmatically
| generate our Python, JavaScript, and C bindings from our Rust
| library. Check out the rust-bridge folder for more info on
| how we do that.
| naveen_k wrote:
| This looks exciting! Will definitely be testing it out in the
| coming days.
|
| I see you offer re-ranking using local models, will there be
| build-in support for making re-ranking calls to external services
| such as cohere in the future?
| smarvin2 wrote:
| Great question! Making calls to external services is not
| something we plan to support. The point of Korvus is to write
| SQL queries that take advantage of the pgml and pgvector
| extensions. Making calls to external services is something that
| could be done by users after retrieval.
| lecha wrote:
| Interesting! Is there a way to deploy this on AWS RDS?
| klysm wrote:
| I'd imagine it just comes down to whether or not the extensions
| are allowed or not
| smarvin2 wrote:
| Unfortunately the pgml extension does not work on AWS RDS so
| there is not.
| jiocrag wrote:
| Is there any way to deploy this to an existing postgres database
| or does it need to use the docker instance.
| smarvin2 wrote:
| You can totally use an existing postgres database. Just make
| sure to install the pgvector and pgml postgres extensions and
| it will work!
| stavros wrote:
| This looks great, thanks! After being disappointed by how flaky
| gpt-4-turbo's RAG is, I want to set up my own, so this came at
| the right time.
|
| One question: Can I use an external model (ie get the raw RAG
| snippets, or prompt text)? Or does it have to be the one
| specified in Korvus?
| smarvin2 wrote:
| You can use Korvus for search and feed the results to an
| external model
| unixhero wrote:
| Is RAG the new DAG? _outoftheloop_
| helsinki wrote:
| Zero mention of what a RAG is on the README. No clue, here.
| jzig wrote:
| From the README:
|
| > Korvus is an all-in-one, open-source RAG (Retrieval-
| Augmented Generation) pipeline...
| brylie wrote:
| Retrieval Augmented Generation uses text that is stored in a
| database to augment user prompts that are sent to a
| generative AI, like a large language model. The retrieval
| results are based on their similarities to the user input.
| The goal is to improve the output of the generative AI by
| providing more information in the input (user prompt +
| retrieval results.) For example, we can provide the LLM
| details from an internal knowledge base so it can generate
| responses that are specific to an organization rather than
| based on general information. It may also reduce errors and
| improve the relevancy of the model output, depending on the
| context.
| simonw wrote:
| Does this work my running LLM such as Llama directly on the
| database server? If so, does that mean that your database and the
| LLM are competing for the same CPU and memory resources?
|
| Can it run the LLM on a GPU?
| smarvin2 wrote:
| It does work by running the LLM on the database server but you
| can configure the LLM to run on the GPU
| nkmnz wrote:
| This sounds very promising, but let me ask an honest question: to
| me, it seems like databases are the hardest part to scale in your
| average IT infrastructure. How much work does it add to the
| database if you let it make all the ML related work as well? How
| much work is saved by reducing the number of necessary queries?
| CuriouslyC wrote:
| This is a read workload that can be easily horizontally scaled.
| The reduction in dev and infrastructure complexity is well
| worth the slight increase in DB provisioning.
| thawkins wrote:
| What LLM system does it use to run models? Does it support
| ollama?
___________________________________________________________________
(page generated 2024-07-11 23:00 UTC)