[HN Gopher] DeepRAG: Thinking to retrieval step by step for larg...
       ___________________________________________________________________
        
       DeepRAG: Thinking to retrieval step by step for large language
       models
        
       Author : fofoz
       Score  : 121 points
       Date   : 2025-02-04 14:43 UTC (8 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | jondwillis wrote:
       | The title reads awkwardly to a native English speaker. A search
       | of the PDF for "latency" returns one result, discussing how naive
       | RAG can result in latency. What are the latency impacts and other
       | trade-offs to achieve the claimed "[improved] answer accuracy by
       | 21.99%"? Is there any way that I could replicate these results
       | without having to write my own implementation?
        
       | brunohaid wrote:
       | Noice!
       | 
       | Does anyone have a good recommendation for a local dev setup that
       | does something similar with available tools? Ie incorporates a
       | bunch of PDFs (~10,000 pages of datasheets) and other docs, as
       | well as a curl style importer?
       | 
       | Trying to wean myself off the next tech molochs, ideally with
       | local functionality similar to OpenAIs Search + Reason, and gave
       | up on Langchain during my first attempt 6 months ago.
        
         | jondwillis wrote:
         | Continue and Cline work with local models (e.g. via Ollama) and
         | have good UX for including different kinds of context. Cursor
         | uses remote models, but provides similar functionality.
        
           | brunohaid wrote:
           | Appreciated! Didn't know Cline already does RAG handling,
           | thought I'd have to wire that up beforehand.
        
           | amrrs wrote:
           | I'm sorry trying to clarify - why would you use Cline (which
           | is coding assistant) for RAG?
        
         | throwup238 wrote:
         | Honestly you're better off rolling your own (but avoid
         | LangChain like the plague). The actual implementation is simple
         | but the devil is in the details - specifically how you chunk
         | your documents to generate vector embeddings. Every time I've
         | tried to apply general purpose RAG tools to specific types of
         | documents like medical records, internal knowledge base, case
         | law, datasheets, and legislation, it's been a mess.
         | 
         | Best case scenario you can come up with a chunking strategy
         | specific to your use case that will make it work: stuff like
         | grouping all the paragraphs/tables about a register together or
         | grouping tables of physical properties in a datasheet with the
         | table title or grouping the paragraphs in a PCB layout
         | guideline together into a single unit. You also have to figure
         | out how much overlap to allow between the different types of
         | chunks and how many dimensions you need in the output vectors.
         | You then have to link chunks together so that when your RAG
         | matches the register description, it knows to include the chunk
         | with the actual documentation so that the LLM can actually use
         | the documentation chunk instead of just the description chunk.
         | I've had to train many a classifier to get this part even
         | remotely usable in nontrivial use cases like caselaw.
         | 
         | Worst case scenario you have to finetune your own embedding
         | model because the colloquialisms the general purpose ones are
         | trained on have little overlap with how terms of art and jargon
         | used in the documents (this is especially bad for legal and
         | highly technical texts IME). This generally requires thousands
         | of examples created by an expert in the field.
        
           | deoxykev wrote:
           | Don't forget to finetune the reranker too if you end up doing
           | the embedding model. That tends to have outsized effects on
           | performance for out of distribution content.
        
           | byefruit wrote:
           | > This generally requires thousands of examples created by an
           | expert in the field.
           | 
           | Or an AI model pretending to be an expert in the field...
           | (works well in a few niche domains I have used this in)
        
           | crishoj wrote:
           | > but avoid LangChain like the plague
           | 
           | Can you elaborate on this?
           | 
           | I have a proof-of-concept RAG system implemented with
           | LangChain, but would like input before committing to this
           | framework.
        
             | t1amat wrote:
             | LangChain is considered complicated to get started with
             | despite offering probably the widest amount of
             | functionality. If you are already comfortable with
             | LangChain you are free to ignore that.
        
           | 3abiton wrote:
           | I am looking up chunking techniques, but the resources are so
           | scarce on this. What's your recommendation?
        
             | RansomStark wrote:
             | If there's a list of techniques and their optimal use cases
             | I haven't found it. I started writing one for the day job,
             | but then graphRAG happened, and Garnter is saying all RAG
             | will be graphRAG.
             | 
             | You can't fight Gartner, no matter how wrong they are, so
             | the work stopped, now everything is a badly implemented
             | graph.
             | 
             | That's a long way to say, if there is a comparison, a link
             | would be most appreciated
        
           | cpursley wrote:
           | I've had great luck just base64'ing images and asking Qwen
           | 2.5 VL to both parse it to markdown and generate a title,
           | description and list of keywords (seems to work well on
           | tables and charts). My plan is to split PDFs into pngs first
           | then run those against Qwen async, then put them into a
           | vector database (haven't gotten around to that quite yet).
        
         | kordlessagain wrote:
         | I've been working on something that provides document search
         | for agents to call if they need the documents. Let me know if
         | you are interested. It's Open Source. For this many documents
         | it will need some bucketing with semantic relationships, which
         | I've been noodling on this last year. Still needs some tweaking
         | for what you are doing, probably. Might get you further along
         | if you are considering rolling your own...
        
           | heywoods wrote:
           | Could I take a look at the repo? Thanks!
        
         | weitendorf wrote:
         | My company (actually our two amazing interns) was working on
         | this over the summer, we abandoned it but it's 85% of the way
         | to doing what you want it to do:
         | https://github.com/accretional/semantifly
         | 
         | We stopped working on it mostly because we had higher
         | priorities and because I became pretty disillusioned with top-K
         | rag. We had to build out a better workflow system anyway, and
         | with that we could instead just have models write and run
         | specific queries (eg list all .ts files containing the word
         | "DatabaseClient"), and otherwise have their context set by
         | users explicitly.
         | 
         | The problem with RAG is that simplistic implementations
         | distract and slow down models. You probably need an
         | implementation that makes multiple passes to prune the context
         | down to what you need to get good results, but that's
         | complicated enough that you might want to build something else
         | that gives you more bang for your buck.
        
       | mkw5053 wrote:
       | This reminds me of the Agent Workflow Memory (AWM) paper [1],
       | which also tries to find optimal decision paths for LLM-based
       | agents but relies on in-context learning, whereas DeepRAG fine-
       | tunes models to decide when to retrieve external knowledge.
       | 
       | I've been thinking about how modifying AWM to use fine-tuning or
       | an external knowledge system (RAG) might work--capturing the
       | 'good' workflows it discovers rather than relying purely on
       | prompting.
       | 
       | [1] https://arxiv.org/abs/2409.07429 - Agent Workflow Memory
       | (Wang et al., 2024)
        
       ___________________________________________________________________
       (page generated 2025-02-04 23:00 UTC)