hngopher.com

       [HN Gopher] Ask HN: Is RAG the Future of LLMs?
       ___________________________________________________________________
        
       Ask HN: Is RAG the Future of LLMs?
        
       It seems to be in vogue that RAG is one of the best solutions to
       reduce the problem of hallucinations in LLMs.  What do you think?
       Are there any other alternatives or solutions on sight?
        
       Author : Gooblebrai
       Score  : 10 points
       Date   : 2024-04-14 22:19 UTC (41 minutes ago)
        
       | redskyluan wrote:
       | What I Observe: Simple RAG is Fading, but Complex RAG Will
       | Persist and Evolve - Involving Query Rewriting, Data Cleaning,
       | Reflection, Vector Search, Graph Search, Rerankers, and More
       | Intelligent Chunking. Large Models Should Not Just Be Knowledge
       | Providers, But Tool Users and Process Drivers"
        
         | danielmarkbruce wrote:
         | It's becoming so complex that it will stop being called RAG.
         | It's just an application that uses an LLM as one part of it.
        
       | spencerchubb wrote:
       | I believe RAG is a temporary hack until we figure out virtually
       | infinite context.
       | 
       | I think LLM context is going to be like cache levels. The first
       | level is small but super fast (like working memory). The next
       | level is larger but slower, and so on.
       | 
       | RAG is basically a bad version of attention mechanisms. RAG is
       | used to focus your attention on relevant documents. The problem
       | is that RAG systems are not trained to minimize loss, it is just
       | a similarity score.
       | 
       | Obligatory note that I could be wrong and it's just my armchair
       | opinion
        
         | mikecaulley wrote:
         | This doesn't consider compute cost; the RAG model is much more
         | efficient compared to infinite context length.
        
         | choilive wrote:
         | I think that "next level" would essentially be a RAG
         | referential information system that gets data via search
         | engines or databases. Maybe we will have the "google search"
         | equivalent completely intended for LLM clients where all data
         | is stored, searched, and returned via vector embeddings, but it
         | could tap on exabytes of information.
        
       | sigmoid10 wrote:
       | The latest research suggests that the best thing you can do is
       | RAG + finetuning on your target domain. Both give roughly equal
       | percentage gains, but they are independent (i.e. they accumulate
       | if you do both). As context windows constantly grow and very
       | recent architectures move more towards linear context complexity,
       | we'll probably see current RAG mechanisms lose importance. I can
       | totally imagine a future where if you have a research level
       | question about physics, you just put a ton of papers and every
       | big graduate physics textbook into the current context instead of
       | searching text snippets using embeddings etc.
        
       | cjbprime wrote:
       | It's hard to imagine what could happen instead. Even with a model
       | with infinite context, where we imagine you could supply e.g.
       | your entire email archive with each message in order to ask
       | questions about one email, the inference time is still
       | proportional to each input token.
       | 
       | So you'd still want to use RAG as a performance optimization,
       | even though today it's being used as more of a "there is no other
       | way to supply enough of your own data to the LLM" must-have.
        
       | tslmy wrote:
       | To make a LLM relevant to you, your intuition might be to fine-
       | tune it with your data, but:
       | 
       | 1. Training a LLM is expensive.
       | 
       | 2. Due to the cost to train, it's hard to update a LLM with
       | latest information.
       | 
       | 3. Observability is lacking. When you ask a LLM a question, it's
       | not obvious how the LLM arrived at its answer.
       | 
       | There's a different approach: Retrieval-Augmented Generation
       | (RAG). Instead of asking LLM to generate an answer immediately,
       | frameworks like LlamaIndex:
       | 
       | 1. retrieves information from your data sources first,
       | 
       | 2. adds it to your question as context, and
       | 
       | 3. asks the LLM to answer based on the enriched prompt.
       | 
       | RAG overcomes all three weaknesses of the fine-tuning approach:
       | 
       | 1. There's no training involved, so it's cheap.
       | 
       | 2. Data is fetched only when you ask for them, so it's always up
       | to date.
       | 
       | 3. The framework can show you the retrieved documents, so it's
       | more trustworthy.
       | 
       | (https://lmy.medium.com/why-rag-is-big-aa60282693dc)
        
       ___________________________________________________________________
       (page generated 2024-04-14 23:00 UTC)