[HN Gopher] DeepRAG: Thinking to retrieval step by step for larg...
___________________________________________________________________
DeepRAG: Thinking to retrieval step by step for large language
models
Author : fofoz
Score : 121 points
Date : 2025-02-04 14:43 UTC (8 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| jondwillis wrote:
| The title reads awkwardly to a native English speaker. A search
| of the PDF for "latency" returns one result, discussing how naive
| RAG can result in latency. What are the latency impacts and other
| trade-offs to achieve the claimed "[improved] answer accuracy by
| 21.99%"? Is there any way that I could replicate these results
| without having to write my own implementation?
| brunohaid wrote:
| Noice!
|
| Does anyone have a good recommendation for a local dev setup that
| does something similar with available tools? Ie incorporates a
| bunch of PDFs (~10,000 pages of datasheets) and other docs, as
| well as a curl style importer?
|
| Trying to wean myself off the next tech molochs, ideally with
| local functionality similar to OpenAIs Search + Reason, and gave
| up on Langchain during my first attempt 6 months ago.
| jondwillis wrote:
| Continue and Cline work with local models (e.g. via Ollama) and
| have good UX for including different kinds of context. Cursor
| uses remote models, but provides similar functionality.
| brunohaid wrote:
| Appreciated! Didn't know Cline already does RAG handling,
| thought I'd have to wire that up beforehand.
| amrrs wrote:
| I'm sorry trying to clarify - why would you use Cline (which
| is coding assistant) for RAG?
| throwup238 wrote:
| Honestly you're better off rolling your own (but avoid
| LangChain like the plague). The actual implementation is simple
| but the devil is in the details - specifically how you chunk
| your documents to generate vector embeddings. Every time I've
| tried to apply general purpose RAG tools to specific types of
| documents like medical records, internal knowledge base, case
| law, datasheets, and legislation, it's been a mess.
|
| Best case scenario you can come up with a chunking strategy
| specific to your use case that will make it work: stuff like
| grouping all the paragraphs/tables about a register together or
| grouping tables of physical properties in a datasheet with the
| table title or grouping the paragraphs in a PCB layout
| guideline together into a single unit. You also have to figure
| out how much overlap to allow between the different types of
| chunks and how many dimensions you need in the output vectors.
| You then have to link chunks together so that when your RAG
| matches the register description, it knows to include the chunk
| with the actual documentation so that the LLM can actually use
| the documentation chunk instead of just the description chunk.
| I've had to train many a classifier to get this part even
| remotely usable in nontrivial use cases like caselaw.
|
| Worst case scenario you have to finetune your own embedding
| model because the colloquialisms the general purpose ones are
| trained on have little overlap with how terms of art and jargon
| used in the documents (this is especially bad for legal and
| highly technical texts IME). This generally requires thousands
| of examples created by an expert in the field.
| deoxykev wrote:
| Don't forget to finetune the reranker too if you end up doing
| the embedding model. That tends to have outsized effects on
| performance for out of distribution content.
| byefruit wrote:
| > This generally requires thousands of examples created by an
| expert in the field.
|
| Or an AI model pretending to be an expert in the field...
| (works well in a few niche domains I have used this in)
| crishoj wrote:
| > but avoid LangChain like the plague
|
| Can you elaborate on this?
|
| I have a proof-of-concept RAG system implemented with
| LangChain, but would like input before committing to this
| framework.
| t1amat wrote:
| LangChain is considered complicated to get started with
| despite offering probably the widest amount of
| functionality. If you are already comfortable with
| LangChain you are free to ignore that.
| 3abiton wrote:
| I am looking up chunking techniques, but the resources are so
| scarce on this. What's your recommendation?
| RansomStark wrote:
| If there's a list of techniques and their optimal use cases
| I haven't found it. I started writing one for the day job,
| but then graphRAG happened, and Garnter is saying all RAG
| will be graphRAG.
|
| You can't fight Gartner, no matter how wrong they are, so
| the work stopped, now everything is a badly implemented
| graph.
|
| That's a long way to say, if there is a comparison, a link
| would be most appreciated
| cpursley wrote:
| I've had great luck just base64'ing images and asking Qwen
| 2.5 VL to both parse it to markdown and generate a title,
| description and list of keywords (seems to work well on
| tables and charts). My plan is to split PDFs into pngs first
| then run those against Qwen async, then put them into a
| vector database (haven't gotten around to that quite yet).
| kordlessagain wrote:
| I've been working on something that provides document search
| for agents to call if they need the documents. Let me know if
| you are interested. It's Open Source. For this many documents
| it will need some bucketing with semantic relationships, which
| I've been noodling on this last year. Still needs some tweaking
| for what you are doing, probably. Might get you further along
| if you are considering rolling your own...
| heywoods wrote:
| Could I take a look at the repo? Thanks!
| weitendorf wrote:
| My company (actually our two amazing interns) was working on
| this over the summer, we abandoned it but it's 85% of the way
| to doing what you want it to do:
| https://github.com/accretional/semantifly
|
| We stopped working on it mostly because we had higher
| priorities and because I became pretty disillusioned with top-K
| rag. We had to build out a better workflow system anyway, and
| with that we could instead just have models write and run
| specific queries (eg list all .ts files containing the word
| "DatabaseClient"), and otherwise have their context set by
| users explicitly.
|
| The problem with RAG is that simplistic implementations
| distract and slow down models. You probably need an
| implementation that makes multiple passes to prune the context
| down to what you need to get good results, but that's
| complicated enough that you might want to build something else
| that gives you more bang for your buck.
| mkw5053 wrote:
| This reminds me of the Agent Workflow Memory (AWM) paper [1],
| which also tries to find optimal decision paths for LLM-based
| agents but relies on in-context learning, whereas DeepRAG fine-
| tunes models to decide when to retrieve external knowledge.
|
| I've been thinking about how modifying AWM to use fine-tuning or
| an external knowledge system (RAG) might work--capturing the
| 'good' workflows it discovers rather than relying purely on
| prompting.
|
| [1] https://arxiv.org/abs/2409.07429 - Agent Workflow Memory
| (Wang et al., 2024)
___________________________________________________________________
(page generated 2025-02-04 23:00 UTC)