[HN Gopher] Better RAG Results with Reciprocal Rank Fusion and H...
       ___________________________________________________________________
        
       Better RAG Results with Reciprocal Rank Fusion and Hybrid Search
        
       Author : johnjwang
       Score  : 116 points
       Date   : 2024-05-30 15:17 UTC (7 hours ago)
        
 (HTM) web link (www.assembled.com)
 (TXT) w3m dump (www.assembled.com)
        
       | edude03 wrote:
       | Thanks for sharing, I like the approach and it makes a lot of
       | sense for the problem space. Especially using existing products
       | vs building/hosting your own.
       | 
       | I was however tripped up by this sentence close to the beginning:
       | 
       | > we encountered a significant challenge with RAG: relying solely
       | on vector search (even using both dense and sparse vectors)
       | doesn't always deliver satisfactory results for certain queries.
       | 
       | Not to be overly pedantic, but that's a problem with vector
       | similarity, not RAG as a concept.
       | 
       | Although the author is clearly aware of that - I have had
       | numerous conversations in the past few months alone of people
       | essentially saying "RAG doesn't work because I use pg_vector (or
       | whatever) and it never finds what I'm looking for" not realizing
       | 1) it's not the only way to do RAG, and 2) there is often a fair
       | difference between the embeddings and the vectorized query, and
       | with awareness of why that is you can figure out how to fix it.
       | 
       | https://medium.com/@cdg2718/why-your-rag-doesnt-work-9755726...
       | basically says everything I often say to people with RAG/vector
       | search problems but again, seems like the assembled team has it
       | handled :)
        
         | johnjwang wrote:
         | Author here: you're for sure right -- it's not a problem with
         | RAG the theoretical concept. In fact, I think RAG
         | implementations should likely be specific to their use cases
         | (e.g. our hybrid search approach works well for customer
         | support, but I'm not sure if it would work as well in other
         | contexts, say for legal bots).
         | 
         | I've seen the whole gamut of RAG implementations as well, and
         | the implementation, specifically prompting and the document
         | search has a lot to do with the end quality.
        
           | verdverm wrote:
           | re: legal, I saw a post on this idea where their RAG system
           | was designed to return the actual text from the document
           | rather than a LLM response or summary. The LLM played a role
           | in turning the query into the search params, but the insight
           | was that for certain kinds of documents, you want the actual
           | source because of the existing, human written summary or the
           | detailed nuances therein
        
             | gradys wrote:
             | Sounds more like Generation Augmented Retrieval in that
             | case.
        
       | yding wrote:
       | Absolutely makes sense!
        
       | SomewhatLikely wrote:
       | Both of your references use RRF=1/(60+Rank)
       | 
       | So I'm not sure why the article uses 1/Rank alone. Did you test
       | both and find that the smoothing didn't help? My understanding is
       | that it has been pretty important for the best results.
        
         | johnjwang wrote:
         | It's a good call out -- we use smoothing parameters that are
         | closer to what you see in the academic articles (they're tuned
         | slightly, but not much).
         | 
         | We used 1/Rank in the article for simplicity purposes, though I
         | can see why this might be confusing to an astute reader.
        
       | cpursley wrote:
       | Any tips on accomplishing this in Postgres with pg_vector?
        
         | flexzuu wrote:
         | Supabase has some good examples on their website, search for
         | hybrid search. I needed to tune the function they have there
         | but it should show you how to approach it.
        
           | cpursley wrote:
           | I'm actually doing something like that already, I'm mostly
           | referring to the Reciprocal Rank Fusion (RRF) part of this to
           | squeeze more out.
        
           | kiwicopple wrote:
           | Here is our doc with RRF:
           | 
           | https://supabase.com/docs/guides/ai/hybrid-search
        
         | evv555 wrote:
         | Llamaindex rerank module
        
           | cpursley wrote:
           | Thanks! Not using Python, but this is still really useful.
        
         | pamelafox wrote:
         | I commented above with a pgvector example that does it-
         | 
         | https://news.ycombinator.com/item?id=40527925
        
       | throwaway115 wrote:
       | I've implemented a very similar RAG hybrid solution, and it has
       | improved LLM responses enormously. There are other things you can
       | do too that have huge improvements, like destructuring your data
       | and placing it into a graph structure, with queryable edge
       | relationships. I think we're just scratching the surface.
        
         | Mkengine wrote:
         | This is really interesting, do you have other recommendations
         | for improvements (gladly with sources I you have any)? I have
         | to build a RAG solution for my job and right now I am
         | collecting information to determine the best way to go ahead.
        
       | esafak wrote:
       | 1. Does anyone know a postgres reranking extension, to go beyond
       | RRF through ML models or at least custom code?
       | 
       | 2. If anyone is observing significant gains from incorporating
       | knowledge graphs into the retrieval step, what kind of a
       | knowledge graph are you working with, what is your retrieval
       | algorithm, and what technology are you using to store it?
        
         | pamelafox wrote:
         | Re 1) pgvector has an example in the repo that uses a model for
         | re-ranking: https://github.com/pgvector/pgvector-
         | python/blob/master/exam...
         | 
         | I'm not using that in my own experiments since I don't want to
         | worry about the performance of running a model on production,
         | but seems worth a try.
        
           | esafak wrote:
           | That's outside the database, though. This is closer to what I
           | had in mind: https://postgresml.org/blog/how-to-improve-
           | search-results-wi...
        
       | retakeming wrote:
       | pg_search (full text search Postgres extension) can be used with
       | pgvector for hybrid search over Postgres tables. It comes with a
       | helpful hybrid search function that uses relative score fusion.
       | Whereas rank fusion considers just the order of the results,
       | relative score fusion uses the actual metrics outputted by
       | text/vector search.
        
       | pamelafox wrote:
       | If you're looking for an example of RRF + Hybrid Search with
       | PostgreSQL, I've put together a FastAPI app here that uses RAG
       | with those options:
       | 
       | https://github.com/Azure-Samples/rag-postgres-openai-python/
       | 
       | Here's the RRF+Hybrid part: https://github.com/Azure-Samples/rag-
       | postgres-openai-python/...
       | 
       | That's largely based off a sample from the pgvector repo, with a
       | few tweaks.
       | 
       | Agreed that Hybrid is the way to go, it's what the Azure AI
       | Search team also recommends, based off their research:
       | 
       | https://techcommunity.microsoft.com/t5/ai-azure-ai-services-...
        
       | pmc00 wrote:
       | For another set of measurements that support RRF + Hybrid >
       | vectors, we (Azure AI Search team) did a bunch of evaluations a
       | few months ago: https://techcommunity.microsoft.com/t5/ai-azure-
       | ai-services-...
       | 
       | We also included supporting data in that write up showing you can
       | improve significantly on top of Hybrid/RRF using a reranking
       | stage (assuming you have a good reranker model), so we shipped
       | one as an optional step as part of our search engine.
        
       | ndricca wrote:
       | May I ask you if you tried hybrid search directly on Pinecone,
       | using Bm25 or splade?
        
       | treprinum wrote:
       | Hybrid might work for English but where are you going to get
       | sparse embeddings like SPLADE or ELSERv2 for most other
       | languages? Vector search with ada-002 or text-003-large capped to
       | the first 500-1000 dimensions will give you a support for 100+
       | languages for free. If you are using BM25, then you need to train
       | BM25 on every single separate knowledge base which is annoying
       | and expensive.
        
       | gregnr wrote:
       | In case you just want a single Postgres function that does RRF
       | (pgvector+fts): https://supabase.com/docs/guides/ai/hybrid-search
       | 
       | (disclaimer: supabase dev who went down the rabbit hole with
       | hybrid search)
        
       | mtbarta3 wrote:
       | Great article. Hybrid search works well for a lot of scenarios.
       | 
       | The tradeoffs of using existing systems vs building your own
       | resonate with me. What we eventually experienced, however, is
       | that periods of bad search performance often correlated to out-
       | of-date search indices.
       | 
       | I'd be interested in another article detailing how you monitor
       | search. It can be tricky to keep an entire search system moving.
        
       | ko_pivot wrote:
       | Meilisearch has a really clean implementation of this. Can easily
       | adjust keyword vs vector weighting per query.
        
       | cheesyFish wrote:
       | RRF is alright, but I've had better results with relative score,
       | or distribution-based scoring.
       | 
       | LlamaIndex has a module for exactly this
       | 
       | https://docs.llamaindex.ai/en/stable/examples/retrievers/rel...
        
       | thefourthchime wrote:
       | I also found pure RAG with vector search to not work. I was
       | creating a bot that could find answers to questions about things
       | by looking at Slack discussions.
       | 
       | At first, I downloaded entire channels, loaded them into a vector
       | DB, and did RAG. The results sucked. Vector searches don't
       | understand things very well, and in this world, specific keywords
       | and error messages are very searchable.
       | 
       | Instead, I take the user's query, ask an LLM (Claude / Bedrock)
       | to find keywords, then search Slack using the API, get results,
       | and use an LLM to filter for discussions that are relevant, then
       | summarize them all in a response.
       | 
       | This is slow, of course, so it's very multi-threaded. A typical
       | response will be within 30 seconds.
        
         | siquick wrote:
         | When you're creating your embedding you can store keywords from
         | the content (using an LLM) in the metadata of each chunk which
         | would positively increase the relevancy of results turned from
         | the retrieval.
         | 
         | LlamaIndex does this out of the box.
        
       | janalsncm wrote:
       | Reciprocal rank scoring is just one way of forcing scores into a
       | fixed distribution: in this case, decaying with the reciprocal of
       | its rank. But it also assumes fixed weight from all components,
       | i.e. the top ranked keyword match has equal relevance to the top
       | ranked semantic match.
       | 
       | There are a couple ways around this. Either learning the relative
       | importance based on the query, and/or using a separate reranking
       | function (usually a DNN) that also takes user behavior into
       | account.
        
       ___________________________________________________________________
       (page generated 2024-05-30 23:00 UTC)