[HN Gopher] Better RAG Results with Reciprocal Rank Fusion and H...
___________________________________________________________________
Better RAG Results with Reciprocal Rank Fusion and Hybrid Search
Author : johnjwang
Score : 116 points
Date : 2024-05-30 15:17 UTC (7 hours ago)
(HTM) web link (www.assembled.com)
(TXT) w3m dump (www.assembled.com)
| edude03 wrote:
| Thanks for sharing, I like the approach and it makes a lot of
| sense for the problem space. Especially using existing products
| vs building/hosting your own.
|
| I was however tripped up by this sentence close to the beginning:
|
| > we encountered a significant challenge with RAG: relying solely
| on vector search (even using both dense and sparse vectors)
| doesn't always deliver satisfactory results for certain queries.
|
| Not to be overly pedantic, but that's a problem with vector
| similarity, not RAG as a concept.
|
| Although the author is clearly aware of that - I have had
| numerous conversations in the past few months alone of people
| essentially saying "RAG doesn't work because I use pg_vector (or
| whatever) and it never finds what I'm looking for" not realizing
| 1) it's not the only way to do RAG, and 2) there is often a fair
| difference between the embeddings and the vectorized query, and
| with awareness of why that is you can figure out how to fix it.
|
| https://medium.com/@cdg2718/why-your-rag-doesnt-work-9755726...
| basically says everything I often say to people with RAG/vector
| search problems but again, seems like the assembled team has it
| handled :)
| johnjwang wrote:
| Author here: you're for sure right -- it's not a problem with
| RAG the theoretical concept. In fact, I think RAG
| implementations should likely be specific to their use cases
| (e.g. our hybrid search approach works well for customer
| support, but I'm not sure if it would work as well in other
| contexts, say for legal bots).
|
| I've seen the whole gamut of RAG implementations as well, and
| the implementation, specifically prompting and the document
| search has a lot to do with the end quality.
| verdverm wrote:
| re: legal, I saw a post on this idea where their RAG system
| was designed to return the actual text from the document
| rather than a LLM response or summary. The LLM played a role
| in turning the query into the search params, but the insight
| was that for certain kinds of documents, you want the actual
| source because of the existing, human written summary or the
| detailed nuances therein
| gradys wrote:
| Sounds more like Generation Augmented Retrieval in that
| case.
| yding wrote:
| Absolutely makes sense!
| SomewhatLikely wrote:
| Both of your references use RRF=1/(60+Rank)
|
| So I'm not sure why the article uses 1/Rank alone. Did you test
| both and find that the smoothing didn't help? My understanding is
| that it has been pretty important for the best results.
| johnjwang wrote:
| It's a good call out -- we use smoothing parameters that are
| closer to what you see in the academic articles (they're tuned
| slightly, but not much).
|
| We used 1/Rank in the article for simplicity purposes, though I
| can see why this might be confusing to an astute reader.
| cpursley wrote:
| Any tips on accomplishing this in Postgres with pg_vector?
| flexzuu wrote:
| Supabase has some good examples on their website, search for
| hybrid search. I needed to tune the function they have there
| but it should show you how to approach it.
| cpursley wrote:
| I'm actually doing something like that already, I'm mostly
| referring to the Reciprocal Rank Fusion (RRF) part of this to
| squeeze more out.
| kiwicopple wrote:
| Here is our doc with RRF:
|
| https://supabase.com/docs/guides/ai/hybrid-search
| evv555 wrote:
| Llamaindex rerank module
| cpursley wrote:
| Thanks! Not using Python, but this is still really useful.
| pamelafox wrote:
| I commented above with a pgvector example that does it-
|
| https://news.ycombinator.com/item?id=40527925
| throwaway115 wrote:
| I've implemented a very similar RAG hybrid solution, and it has
| improved LLM responses enormously. There are other things you can
| do too that have huge improvements, like destructuring your data
| and placing it into a graph structure, with queryable edge
| relationships. I think we're just scratching the surface.
| Mkengine wrote:
| This is really interesting, do you have other recommendations
| for improvements (gladly with sources I you have any)? I have
| to build a RAG solution for my job and right now I am
| collecting information to determine the best way to go ahead.
| esafak wrote:
| 1. Does anyone know a postgres reranking extension, to go beyond
| RRF through ML models or at least custom code?
|
| 2. If anyone is observing significant gains from incorporating
| knowledge graphs into the retrieval step, what kind of a
| knowledge graph are you working with, what is your retrieval
| algorithm, and what technology are you using to store it?
| pamelafox wrote:
| Re 1) pgvector has an example in the repo that uses a model for
| re-ranking: https://github.com/pgvector/pgvector-
| python/blob/master/exam...
|
| I'm not using that in my own experiments since I don't want to
| worry about the performance of running a model on production,
| but seems worth a try.
| esafak wrote:
| That's outside the database, though. This is closer to what I
| had in mind: https://postgresml.org/blog/how-to-improve-
| search-results-wi...
| retakeming wrote:
| pg_search (full text search Postgres extension) can be used with
| pgvector for hybrid search over Postgres tables. It comes with a
| helpful hybrid search function that uses relative score fusion.
| Whereas rank fusion considers just the order of the results,
| relative score fusion uses the actual metrics outputted by
| text/vector search.
| pamelafox wrote:
| If you're looking for an example of RRF + Hybrid Search with
| PostgreSQL, I've put together a FastAPI app here that uses RAG
| with those options:
|
| https://github.com/Azure-Samples/rag-postgres-openai-python/
|
| Here's the RRF+Hybrid part: https://github.com/Azure-Samples/rag-
| postgres-openai-python/...
|
| That's largely based off a sample from the pgvector repo, with a
| few tweaks.
|
| Agreed that Hybrid is the way to go, it's what the Azure AI
| Search team also recommends, based off their research:
|
| https://techcommunity.microsoft.com/t5/ai-azure-ai-services-...
| pmc00 wrote:
| For another set of measurements that support RRF + Hybrid >
| vectors, we (Azure AI Search team) did a bunch of evaluations a
| few months ago: https://techcommunity.microsoft.com/t5/ai-azure-
| ai-services-...
|
| We also included supporting data in that write up showing you can
| improve significantly on top of Hybrid/RRF using a reranking
| stage (assuming you have a good reranker model), so we shipped
| one as an optional step as part of our search engine.
| ndricca wrote:
| May I ask you if you tried hybrid search directly on Pinecone,
| using Bm25 or splade?
| treprinum wrote:
| Hybrid might work for English but where are you going to get
| sparse embeddings like SPLADE or ELSERv2 for most other
| languages? Vector search with ada-002 or text-003-large capped to
| the first 500-1000 dimensions will give you a support for 100+
| languages for free. If you are using BM25, then you need to train
| BM25 on every single separate knowledge base which is annoying
| and expensive.
| gregnr wrote:
| In case you just want a single Postgres function that does RRF
| (pgvector+fts): https://supabase.com/docs/guides/ai/hybrid-search
|
| (disclaimer: supabase dev who went down the rabbit hole with
| hybrid search)
| mtbarta3 wrote:
| Great article. Hybrid search works well for a lot of scenarios.
|
| The tradeoffs of using existing systems vs building your own
| resonate with me. What we eventually experienced, however, is
| that periods of bad search performance often correlated to out-
| of-date search indices.
|
| I'd be interested in another article detailing how you monitor
| search. It can be tricky to keep an entire search system moving.
| ko_pivot wrote:
| Meilisearch has a really clean implementation of this. Can easily
| adjust keyword vs vector weighting per query.
| cheesyFish wrote:
| RRF is alright, but I've had better results with relative score,
| or distribution-based scoring.
|
| LlamaIndex has a module for exactly this
|
| https://docs.llamaindex.ai/en/stable/examples/retrievers/rel...
| thefourthchime wrote:
| I also found pure RAG with vector search to not work. I was
| creating a bot that could find answers to questions about things
| by looking at Slack discussions.
|
| At first, I downloaded entire channels, loaded them into a vector
| DB, and did RAG. The results sucked. Vector searches don't
| understand things very well, and in this world, specific keywords
| and error messages are very searchable.
|
| Instead, I take the user's query, ask an LLM (Claude / Bedrock)
| to find keywords, then search Slack using the API, get results,
| and use an LLM to filter for discussions that are relevant, then
| summarize them all in a response.
|
| This is slow, of course, so it's very multi-threaded. A typical
| response will be within 30 seconds.
| siquick wrote:
| When you're creating your embedding you can store keywords from
| the content (using an LLM) in the metadata of each chunk which
| would positively increase the relevancy of results turned from
| the retrieval.
|
| LlamaIndex does this out of the box.
| janalsncm wrote:
| Reciprocal rank scoring is just one way of forcing scores into a
| fixed distribution: in this case, decaying with the reciprocal of
| its rank. But it also assumes fixed weight from all components,
| i.e. the top ranked keyword match has equal relevance to the top
| ranked semantic match.
|
| There are a couple ways around this. Either learning the relative
| importance based on the query, and/or using a separate reranking
| function (usually a DNN) that also takes user behavior into
| account.
___________________________________________________________________
(page generated 2024-05-30 23:00 UTC)