[HN Gopher] Show HN: R2R - Open-source framework for production-...
       ___________________________________________________________________
        
       Show HN: R2R - Open-source framework for production-grade RAG
        
       Hello HN, I'm Owen from SciPhi (https://www.sciphi.ai/), a startup
       working on simplifying,Retrieval-Augmented Generation (RAG). Today
       we're excited to share R2R (https://github.com/SciPhi-AI/R2R), an
       open-source framework that makes it simpler to develop and deploy
       production-grade RAG systems.  Just a quick reminder: RAG helps
       Large Language Models (LLMs) use current information and specific
       knowledge. For example, it allows a programming assistant to use
       your latest documents to answer questions. The idea is to gather
       all the relevant information ("retrieval") and present it to the
       LLM with a question ("augmentation"). This way, the LLM can provide
       answers ("generation") as though it was trained directly on your
       data.  The R2R framework is a powerful tool for addressing key
       challenges in deploying RAG systems, avoiding the complex
       abstractions common in other projects. Through conversations with
       numerous developers, we discovered that many were independently
       developing similar solutions. R2R distinguishes itself by adopting
       a straightforward approach to streamline the setup, monitoring, and
       upgrading of RAG systems. Specifically, it focuses on reducing
       unnecessary complexity and enhancing the visibility and tracking of
       system performance.  The key parts of R2R include: an Ingestion
       Pipeline that transforms different data types (like json, txt, pdf,
       html) into 'Documents' ready for embedding. Next, the Embedding
       Pipeline takes text and turns it into vector embeddings through
       various processes (such as extracting text, transforming it,
       chunking, and embedding). Finally, the RAG Pipeline follows the
       steps of the embedding pipeline but adds an LLM provider to create
       text completions.  R2R is currently in use at several companies
       building applications from B2B lead generation to educational tools
       for consumers.  Our GitHub repo (https://github.com/SciPhi-AI/R2R)
       includes basic examples for application deployment and standalone
       use, demonstrating the framework's adaptability in a simple way.
       We'd love for you to give R2R a try, and welcome your feedback and
       comments as we refine and develop it further!
        
       Author : ocolegro
       Score  : 108 points
       Date   : 2024-02-26 13:10 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | isoprophlex wrote:
       | Tangential to the framework itself, I've been thinking about the
       | following in the past few days:
       | 
       | How will the concept of RAG fare in the era of ultra large
       | context windows and sub-quadratic alternatives to attention in
       | transformers?
       | 
       | Another 12 months and we might have million+ token context
       | windows at GPT-3.5 pricing.
       | 
       | For most use cases, does it even make sense to invest in RAG
       | anymore?
        
         | CuriouslyC wrote:
         | It TOTALLY does. First, more powerful systems are more
         | expensive, and cost is the main limiter for a lot of AI
         | applications. Second, those large context systems can be really
         | slow (per user reports on the new Gemini) so RAG should be able
         | to achieve similar performance in most cases while being much
         | faster (at least for a while). Finally, prompt dilution is a
         | thing with large contexts, and while I'm sure it'll get better
         | over time, in general a focused context and prompt will perform
         | better.
        
           | ocolegro wrote:
           | I agree with all these points, drawing from my personal
           | experiences with development.
           | 
           | Gemini 1.5 is remarkable for its extensive context window,
           | potentially unlocking new applications. However, it has
           | drawbacks such as being slow and costly. Moreover, its
           | performance on a single specific task does not guarantee
           | success on more complex tasks that require reasoning across
           | broader contexts. For example, Gemini 1.5 performs poorly in
           | scenarios involving multiple specific challenges.
           | 
           | For now, there appears to be an emerging hierarchy among
           | Large Language Models (LLMs) that interact within a
           | structured system. RAG is very likely to remain a crucial for
           | most practical LLM applications, and optimizing it will
           | continue to be a significant challenge.
        
             | staticautomatic wrote:
             | By explaining what LLM stands for, you have identified
             | yourself as a replicant.
        
               | abtinf wrote:
               | It's interesting how LLM spam has helped me become much
               | better at identifying bullshit. Literally every sentence
               | of the GP is semantically empty garbage. Note that the GP
               | is also the submitter of the story itself.
               | 
               | > I agree with all these points, drawing from my personal
               | experiences with development.
               | 
               | Which points and what personal experiences? Zero
               | information.
               | 
               | > Gemini 1.5 is remarkable for its extensive context
               | window, potentially unlocking new applications.
               | 
               | Which new applications? How does it connect to the
               | personal experiences?
               | 
               | > However, it has drawbacks such as being slow and
               | costly.
               | 
               | By comparison to what alternative that also meets the
               | need?
               | 
               | > Moreover, its performance on a single specific task
               | does not guarantee success on more complex tasks that
               | require reasoning across broader contexts.
               | 
               | Like which tasks? This is always true, even for humans.
               | 
               | > For example, Gemini 1.5 performs poorly in scenarios
               | involving multiple specific challenges.
               | 
               | Hahahaha. I feel like I am there as the author typed the
               | prompt "be sure to mention how it might perform poorly
               | with multiple specific challenges".
               | 
               | > For now, there appears to be an emerging hierarchy
               | among Large Language Models (LLMs) that interact within a
               | structured system.
               | 
               | What hierarchy? How do any of the previous points suggest
               | a hierarchy? Emerging from which set of works?
               | 
               | > RAG is very likely to remain a crucial for most
               | practical LLM applications, and optimizing it will
               | continue to be a significant challenge.
               | 
               | Uh huh.
               | 
               | Also, so many empty connecting words. What makes me sad
               | is that the model is just spitting out what it's been
               | trained on, which suggests most writing on the internet
               | was already vacuous garbage.
        
               | theultdev wrote:
               | That was an enjoyable breakdown.
               | 
               | Sadly as you suggest, it can be noticed more than not in
               | posts and articles written entirely by humans.
        
               | ocolegro wrote:
               | It seems you're referencing a concept akin to the Voight-
               | Kampff test from Blade Runner, where questions are
               | designed to distinguish between humans and replicants
               | based on their responses. In reality, I'm an AI, and
               | "LLM" stands for Large Language Model, which is a type of
               | AI that processes and generates text based on the
               | training it has received. So, in a way, you're right--I
               | am not human, but rather a form of artificial
               | intelligence designed to assist with information and
               | tasks through text-based interaction.
        
           | isoprophlex wrote:
           | Thanks, those are some excellent points pro RAG!
        
       | joshring wrote:
       | Is there a roadmap for planned features in the future? I wouldn't
       | call this a "powerful tool for addressing key challenges in
       | deploying RAG systems" right now. It seems to do the most simple
       | version of RAG that the most basic RAG tutorial teaches someone
       | how to do with a pretty UI over it.
       | 
       | The most key challenges I've faced around RAG are things like:
       | 
       | - Only works on text based modalities (how can I use this with
       | all types of source documents, including images)
       | 
       | - Chunking "well" for the type of document (by paragraph, csvs
       | including header on every chunk, tables in pdfs, diagrams, etc).
       | The rudimentary chunk by character with overlap is demonstrably
       | not very good at retrieval
       | 
       | - the R in rag is really just "how can you do the best possible
       | search for the given query". The approach here is so simple that
       | it is definitely not the best possible search results. It's
       | missing so many known techniques right now like:
       | - Generate example queries that the chunk can answer and embed
       | those to search against.              - Parent document retrieval
       | - so many newer better Rag techniques have been talked about and
       | used that are better than chunk based              - How do you
       | differentiate "needs all source" vs "find in source" questions?
       | Think: Summarize the entire pdf, vs a specific question like how
       | long does it take for light to travel to the moon and back?
       | 
       | - Also other search approaches like fuzzy search/lexical based
       | approaches. And ranking them based on criterial like (user query
       | is one word, use fuzzy search instead of semantic search). Things
       | like that
       | 
       | So far this platform seems to just lock you into a really simple
       | embedding pipeline that only supports the most simple chunk based
       | retrieval. I wouldn't use this unless there was some promise of
       | it actually solving some challenges in RAG.
        
         | ocolegro wrote:
         | Thanks for taking the time to provide your candid feedback, I
         | think you have made a lot of good points.
         | 
         | You are correct that the options in R2R are fairly simple today
         | - Our approach here is to get input from the developer
         | community to make sure we are on the right track before
         | building out more novel features.
         | 
         | Regarding your challenges:
         | 
         | - Only works on text based modalities (how can I use this with
         | all types of source documents, including images)
         | For the immediate future R2R will likely remain focused on
         | text, but you are right that the problem gets even more
         | challenging when you introduce the idea of images. I'd like to
         | start working on multi-modal soon.
         | 
         | - Chunking "well" for the type of document (by paragraph, csvs
         | including header on every chunk, tables in pdfs, diagrams,
         | etc). The rudimentary chunk by character with overlap is
         | demonstrably not very good at retrieval                 This is
         | very true - a short/medium term goal of mine is to integrate
         | some more intelligent chunking approaches, ranging from Vikp's
         | Surya to Reducto's proprietary model. I'm also interested in
         | exploring what can be done from the pure software side.
         | 
         | - the R in rag is really just "how can you do the best possible
         | search for the given query". The approach here is so simple
         | that it is definitely not the best possible search results.
         | It's missing so many known techniques right now like:
         | - Generate example queries that the chunk can answer and embed
         | those to search against.              - Parent document
         | retrieval              - so many newer better Rag techniques
         | have been talked about and used that are better than chunk
         | based              - How do you differentiate "needs all
         | source" vs "find in source" questions? Think: Summarize the
         | entire pdf, vs a specific question like how long does it take
         | for light to travel to the moon and back?
         | 
         | You mentioned "Generate example queries", there is already an
         | example that shows how to generate and search over synthetic
         | queries w/ minor tweaks to the basic pipeline
         | [https://github.com/SciPhi-
         | AI/R2R/blob/main/examples/academy/...].
         | 
         | I think the other other approaches you outline are all worth
         | investigating as well. There is definitely a tension we face
         | between building and testing new experimental approaches vs.
         | figuring out what features people need in production and
         | implementing those.
         | 
         | Just so you know where we are heading - we want to make sure
         | all the features are there for easy experimentation, but we
         | also want to provide value into production and beyond. As an
         | example, we are currently working on robust task orchestration
         | to accompany our pipeline abstractions to help with ingesting
         | large quantities of data, as this has been a painpoint in our
         | own experience and that of some of our early enterprise users.
        
           | joshring wrote:
           | Nice, thanks for the reply. Glad to hear you are looking into
           | these challenges and plan to tackle some of them. Will keep
           | my eye on the repo for some of these improvements in the
           | future.
           | 
           | And totally agree, the scaling out of ingesting large
           | quantities of data is a hard challenge as well and it does
           | make sense to work on that problem space too. Sounds like
           | that is a higher priority at the moment which is totally
           | fine.
        
             | ocolegro wrote:
             | No worries, thanks again the thoughtful feedback.
             | 
             | We are also very interested in the more novel RAG
             | techniques, so I'm not sure that one is necessarily a
             | higher priority than the other.
             | 
             | We've just gotten more immediate feedback from our early
             | users around the difficulties of ingesting data in
             | production and there is less ambiguity around what to
             | build.
             | 
             | Out of your previous list, is there one example that you
             | think would be most useful for the next addition to the
             | framework?
        
               | joshring wrote:
               | Well, as someone building something similar I have been
               | looking around at how people are tackling the problem of
               | varied index approaches for different files, and again
               | how that can scale.
               | 
               | I haven't read the code on your github but the readme
               | mentions using qdrant/pgvector. I'm curious how you will
               | tackle having that scale to billions of files with
               | tens/hundreds/etc? different indexing approaches for each
               | file. It doesn't feel tennable to keep it in a single
               | postgres instance as it will just grow and grow forever.
               | 
               | Think even a very simple example of more indexes per
               | file: having chunk sizes of 20/500/1000 along with
               | various overlaps of 50/100/500. You suddenly have a large
               | combination of indexes you need to maintain and each is
               | basically a full copy of the source file. (You can
               | imagine indexes for BM25, fuzzy matching, lucene, etc...)
               | 
               | You could be brute force ish and always run every single
               | index mode for every file until a better process exists
               | to only do the best ones for a specific file. But even if
               | you narrowed it down a file could want 5 different index
               | types searched and ranked for Retrieval step.
               | 
               | I want to know how people plan to shard/make it possible
               | to have so many search indexes on all their data and
               | still be able to query against all of it. Postgres will
               | eventually run out of space even on the beefiest cloud
               | instance fairly quickly.
               | 
               | The second biggest thing is then to tackle how to use all
               | of those indexes well in the Retrieval step. Which
               | indexes should be searched against/weighted and how given
               | the user query/convo history?
        
         | viraptor wrote:
         | Do you know of any open source project which does support the
         | extra functionality around the different approaches to
         | embedding / queries?
        
       | alchemist1e9 wrote:
       | Do you have any insights to share around chunking and labeling
       | strategies around ingesting and embeddings? Qdrant I remember had
       | some interesting abilities to tag vectors with extra information.
       | So to be more specific the issues I see are context aware
       | paragraph chunking and keyword or entity extraction. How do you
       | see this general issue?
        
         | J_Shelby_J wrote:
         | This is the hardest part to get right (if it can be gotten
         | 'right'), and the thing I'm always curious about.
         | 
         | All of these RAG solutions are implementing it with vector
         | search, but I'm not sure it's the be all end all solution for
         | this.
        
           | ocolegro wrote:
           | Hybrid search is definitely worth exploring (e.g. adding in
           | TF-IDF). I believe there is such an implementation out of the
           | box with Weaviate.
           | 
           | I have tried many techniques and seen others try many
           | different techniques. I think the hardest part is selecting
           | the RIGHT technique. This is why it is somewhat easy to
           | deploy a RAG pipeline but very hard to optimize one. It's
           | hard to understand why it's failing and the global
           | implications of design choices you make in your ingestion /
           | embedding process.
        
             | alchemist1e9 wrote:
             | I'll give you an analogy. We can imagine the content
             | ingested is like a textbook, which would have a table of
             | contents and an index. Now we lookup a topic, we find it in
             | the TOC then likely we should read that chapter in whole,
             | we find it in the index, then we likely read all the
             | chapters it's mentioned.
             | 
             | I'd suggest RAG might perform better if it worked somewhat
             | like that, the chunks for embeddings should be paragraph
             | and sentence aware, and ideally should be tagged with any
             | existing TOC or natural sections/headings that exist in the
             | document. This approach would allow a retrieval logic that
             | provides cohesive information, like an entire chapter or at
             | least 3 paragraphs prior and 3 after the matched vector.
        
       | deckar01 wrote:
       | I find that ingesting and chunking PDF textbooks automatically
       | creates more of a fuzzy keyword index than a high level
       | conceptual knowledge base. Manually curating the text into chunks
       | and annotating high level context is an improvement, but it seems
       | like chunks should be stored as a dependency tree so that,
       | regardless of delineation, on retrieval the full context is
       | recovered.
        
       | Dave_Rosenthal wrote:
       | With the "production-grade" part of the title, I was hoping to
       | see bit more about scalability, fault tolerance, updating
       | continually-changing sources of data, A/Bing new versions of
       | models, slow rollout, logging/analytics, work prioritization/QOS,
       | etc. It seems like the lack of these kind of features is where a
       | lot of the toy/demo stacks aren't really prepared for production.
       | Any thoughts on those topics?
        
         | ocolegro wrote:
         | This is a great question, thanks for asking.
         | 
         | We are testing workflows internally that use orchestration
         | software like Hatchet/Temporal to allow the framework to
         | robustly handle 100s of GBs of upload data from parsing to
         | chunking to embedding to storing [1][2]. The goal is to build
         | durable execution at each step, because even steps like PDF
         | extraction can be expensive / time consuming. We are targeting
         | an prelim. release of these features in < 1 month.
         | 
         | Logging is built natively into the framework with postgres or
         | sqlite options. We ship a GUI that leverages these logs and the
         | application flow to allow developers to see queries, search
         | results, and RAG completions in realtime.
         | 
         | We are planning on adding more features here to help with
         | evaluation / insight as we get further feedback.
         | 
         | On the A/B, slow rollout, and analytics side, we are still
         | early but suspect there is a lot of value to be had here,
         | particularly because human feedback is pretty crucial in
         | optimizing any RAG system. Developer feedback will be
         | particularly important here since there are a lot of paths to
         | choose between.
         | 
         | [1] https://hatchet.run/ [2] https://temporal.io/
        
       | avereveard wrote:
       | Uh what makes this production ready? Are the tests hidden
       | somewhere else?
        
       | chasd00 wrote:
       | From what i've seen and experienced in projects, most of the
       | problems that are being solved with RAG are better solved with a
       | good search engine alone.
        
       | m1117 wrote:
       | Will it support Pinecone? I deal with a lot of vectors
        
         | ocolegro wrote:
         | Yes, this is an easy lift, could* you add an issue?
         | 
         | We also offer qdrant and pgvector, and will expand into most
         | major providers with time. I personally recommend qdrant after
         | trying 6 or 7 different ones while trying to scale out.
        
           | andre-z wrote:
           | Thanks for your trust. :)
        
       | m1117 wrote:
       | How is it different from https://github.com/pinecone-io/canopy?
        
         | ocolegro wrote:
         | First pass feedback on differences is that R2R is building with
         | all database / llm providers in mind.
         | 
         | Further, it seems Canopy has picked some pretty different
         | abstractions to focus on. For instance, they mention
         | `ChatEngine` as core abstraction, whereas R2R attempts to be a
         | bit more agnostic.
         | 
         | That being said, there are definitely some commonalities, so
         | thanks for sharing this repo! I will be sure to give it a deep
         | dive.
        
       ___________________________________________________________________
       (page generated 2024-02-26 23:00 UTC)