[HN Gopher] Knowledge Graphs in RAG: Hype vs. Ragas Analysis
       ___________________________________________________________________
        
       Knowledge Graphs in RAG: Hype vs. Ragas Analysis
        
       Author : rooftopzen
       Score  : 117 points
       Date   : 2024-07-09 21:00 UTC (1 days ago)
        
 (HTM) web link (aiencoder.substack.com)
 (TXT) w3m dump (aiencoder.substack.com)
        
       | jimmySixDOF wrote:
       | This is a nice sandbox walkthrough of the author's objective
       | which was to test MSFT claims in the paper -- but with all due
       | respect the buzz of graphs is because they add whole third layer
       | in a combined approach like Reciprocal Rank Fusion (RRF). You do
       | a BM25 search then you do a vector based nearest neighbors search
       | and now you can add a KG search then all combined with local and
       | global reranking etc the expectation is this produces a better
       | final outcome. These findings aside, it still makes sense that
       | adding KG to a hybrid search pipeline is going to be useful.
        
       | itkovian_ wrote:
       | Knowledge graphs where created to solve the problem of making
       | natural,free flowing text machine processable. We now have a
       | technology that completely understands natural free flowing text
       | and can extract meaning. Why would going back to structure help
       | when that structure can never be as rich as just text. I get it
       | if the kb has new information, that's not what I'm saying.
        
         | esafak wrote:
         | But RAG without graphs just relies on similarity search, which
         | isn't very smart.
        
         | visarga wrote:
         | > Why would going back to structure help
         | 
         | When your corpus is large it is useful to split it up and
         | hierarchically combine. In their place I would do both bottom-
         | up and top-down summarization passes, so information can
         | percolate from a leaf to the root and from the root to a
         | different leaf. Global context can illuminate local summaries,
         | for example think of the twist in a novel, it sheds new light
         | on everything.
        
           | itkovian_ wrote:
           | That's not what a kb is
        
         | th0ma5 wrote:
         | > We now have a technology that completely understands natural
         | free flowing text and can extract meaning.
         | 
         | Actually we don't. I know it certainly feels like LLMs do this
         | but no one would dare stake their life on their output if they
         | know how they work. Still useful!
        
       | visarga wrote:
       | The Microsoft GraphRAG paper focuses on global sensemaking
       | through hierarchical summarization, which is a fundamental aspect
       | of their approach. The blog post analysis, however, doesn't
       | address this core feature at all. Another issue is the corpus
       | size, the paper focuses on sizes on the order of 1M tokens, while
       | the reference text used in the blog post is probably shorter. On
       | shorter text a simple LLM call could do summarization directly.
        
       | piizei wrote:
       | Looks like the test-setup confuses knowledge graphs with graph
       | databases. The code just creates a neo4j database from a
       | document, not a knowledge graph (basically uses neo4j as vector
       | database). A knowledge graph would be created by a LLM as a
       | preprocessing step (and queried similary by an LLM). This is a
       | different approach than was tested, an approach that trades
       | preprocessing time and domain knowledge for accuracy. Reference:
       | https://python.langchain.com/v0.1/docs/use_cases/graph/const...
        
         | rcarmo wrote:
         | Yeah, I think the dataset is flawed. GraphRAG appears to be
         | aimed at navigating the Microsoft 365 document and people graph
         | that you get in an organization setting, not doing a homogenous
         | search.
        
       | qeternity wrote:
       | I don't believe the author read the GraphRAG paper as there is
       | nothing in this "deep dive" that implements anything remotely
       | close.
        
       | dmezzetti wrote:
       | There is no one size fits all formula. For simple RAG, a search
       | query (vector, keyword, SQL, etc) works to build a context.
       | 
       | For more complex questions or research, a knowledge graph can be
       | beneficial. I wrote an article[1] earlier this year that used
       | graph path traversal to build a context.
       | 
       | The goal was to build a short narrative about English history
       | from 500 - 1000 using Wikipedia articles. Vector similarity alone
       | won't bring back good results. This article used a cypher graph
       | path query that jumped multiple hops through concepts of
       | interest. Those articles on that path were then brought in as the
       | context.
       | 
       | [1] https://neuml.hashnode.dev/advanced-rag-with-graph-path-
       | trav...
        
       | davedx wrote:
       | This seems highly relevant: https://arxiv.org/abs/2406.01506
       | 
       | > In this paper, we study the two foundational questions in this
       | area. First, how are categorical concepts, such as {'mammal',
       | 'bird', 'reptile', 'fish'}, represented? Second, how are
       | hierarchical relations between concepts encoded? For example, how
       | is the fact that 'dog' is a kind of 'mammal' encoded? We show how
       | to extend the linear representation hypothesis to answer these
       | questions. We find a remarkably simple structure: simple
       | categorical concepts are represented as simplices, hierarchically
       | related concepts are orthogonal in a sense we make precise, and
       | (in consequence) complex concepts are represented as polytopes
       | constructed from direct sums of simplices, reflecting the
       | hierarchical structure.
       | 
       | Basically, LLM's _already partially encode information as
       | semantic graphs internally_.
       | 
       | With this it is less surprising that augmenting them with
       | external knowledge graphs has a lower ROI.
        
         | CharlieDigital wrote:
         | > Basically, LLM's already partially encode information as
         | semantic graphs internally.
         | 
         | There's an (underutilized?) technique here to take advantage of
         | that internal graph: have the LLM tell you the related concepts
         | first and then perform the RAG using not just the original
         | concept, but the expanded set of related concepts.
         | 
         | So:                   concept - [related concepts] - [[.. rag-
         | rc1],[.. rag-rc2],[.. rag-rcn]] - summarize
         | 
         | With GPTs prior to 4o, it would have been too slow to do this
         | as a two-step process. With 4o and some of the higher
         | throughput Llama3 based options (Together.ai, Fireworks.ai,
         | Groq.com), a two-step fan-out RAG approach takes advantage of
         | this internal graph and could probably yield similar gains in
         | RAG without additional infrastructure (another datastore) nor
         | data pre-processing to take advantage of a graph approach.
        
           | davedx wrote:
           | Yup. Fascinating stuff really. Kind of like mnemonics for
           | LLM's, if you squint a bit.
        
           | neeleshs wrote:
           | Even with old GPT, if the summary is decent, it works
           | reasonably well with even no RAG. We are a data management
           | platform and allow users to build data pipelines around a
           | data model. This is basically a DAG. We autogenerate
           | documentation for these pipelines, using gpt4, and feed a
           | summarized version of the data pipeline - expressed as
           | graphviz dot file format in the prompt. gpt4 understands this
           | format well, and seeminlgy understands the graph itself
           | reasonably well!
           | 
           | It performs poorly expressing the higher level intent of the
           | pipeline, but tactical details are accurately documented. We
           | are trying to push prompting itself more, before turning to
           | RAG & finetuning
        
         | mkehrt wrote:
         | As an GenAI skeptic, I think this is a very cool finding. My
         | experience with AI tools is that they are complete bullshit
         | artists. But to a large extent that's just a result of the way
         | they are trained. If this description of how the data is
         | structured is correct, it indicates that these programs do
         | encode a real model about the world. Perhaps alternative ways
         | of training these same models, or fixing the data afterwards,
         | will result in more truthful models.
        
       | mark_l_watson wrote:
       | That is an interesting writeup, but I had trouble understanding
       | what they meant by what for me is a new term: "faithfulness."
       | 
       | This is supposedly a measure of reducing hallucinations. Is it
       | just me, or did other people here have difficulty understanding
       | how faithfulness was evaluated?
       | 
       | EDIT: OK, faithfulness is calculated by human evaluation, and can
       | be automatically calculated with ROUGE and BLEU.
        
       | Tostino wrote:
       | I really need to dig into the more recent advances in knowledge
       | graphs + LLMs. I've been out of the game for ~10 months now, and
       | am just starting to dig back into things and get my training
       | pipeline working (darn bitrot...)
       | 
       | I had previously trained a llama2 13b model
       | (https://huggingface.co/Tostino/Inkbot-13B-8k-0.2) on a whole
       | bunch of knowledge graph tasks (in addition to a number of other
       | tasks).
       | 
       | Here is an example of the training data for training it how to
       | use knowledge graphs:
       | 
       | easy -
       | https://gist.github.com/Tostino/76c55bdeb1f099fb2bfab00ce144...
       | 
       | medium -
       | https://gist.github.com/Tostino/0460c18024697efc2ac34fe86ecd...
       | 
       | I also trained it on generating KGs from conversations, or
       | articles you have provided. So from the LLM side, it's way more
       | knowledgeable about the connections in the graph than GPT4 is by
       | default.
       | 
       | Here are a couple examples of the trained model actually
       | generating a knowledge graph:
       | 
       | 1.
       | https://gist.github.com/Tostino/c3541f3a01d420e771f66c62014e...
       | 
       | 2.
       | https://gist.github.com/Tostino/44bbc6a6321df5df23ba5b400a01...
       | 
       | I haven't done any work on integrating those into larger
       | structures, combining the graphs generated from different
       | documents, or using a graph database to augment my use case...all
       | things I am eager to try out, and I am glad there is a bunch more
       | to read on the topic available now.
       | 
       | Anyways, near term plans are to train a llama3 8b, and likely a
       | phi-3 13b version of Inkbot on an improved version of my dataset.
       | Glad to see others as excited as was on this topic!
        
       | lmeyerov wrote:
       | I'm happy to see third-party comparisons, most of the marketing
       | here indeed just assumes KGs are better with zero proof:
       | marketers to be wary of. Unfortunately, I suspect a few key steps
       | need to happen for this post to fairly reflect what the Microsoft
       | NLP researchers called their alg, vs the broader family named by
       | neo4j. Afaict, they're talking about a different graph.
       | 
       | * The kg index should be text documents hierarchically summarized
       | based on an extracted named-entity-relation graph. The blog
       | version seems to instead do (document, word), not the KG, and
       | afaict, skips the hierarchical NER community summarization. The
       | blog post is doing what neo4j calls a lexical graph, not the
       | novel KG summary index of the MSR paper.
       | 
       | * The data volume should go up. Think a corpus like 100k+ tweets
       | or 100+ documents. You start to see challenges like redundant
       | tweets that clog retrieval/ranking, or many pieces of the puzzle
       | spread over disparate chunks with indirect 'multi-hop' reasoning.
       | Something like a debate can fit into one ChatGPT call, with no
       | RAG. It's an interesting question how summarization preprocessing
       | can still help small documents, but a more nuanced topic (and we
       | have Thoughts on ;-))
       | 
       | * The tasks should reflect the challenges: multi-hop reasoning,
       | wider summarization with fixed budget, etc. Retesting simple
       | queries naive RAG already solves isn't the point. The paper
       | focused on a couple types, which is also why they route to 2 diff
       | retrieval modes. Subtle, part of the challenge in bigger data is
       | how many resources we give the retriever & reasoner, and part of
       | why graph rag is exciting IMO.
       | 
       | Afaict the blogpost essentially did a lexical graph with
       | chunk/node embeddings, reran on a small document, and at that
       | scale, asked simple q's... So close to a naive retrieval, and
       | unsurprisingly, got parity. It's not too much more to improve so
       | would encourage doing a bit more. Beyond the MSR paper, I would
       | also experiment a bit more with retrieval strategies, eg, agentic
       | layer on top, and include simple text search mixed in with
       | reranking. And as validation before any of that, focus
       | specifically on the queries expected to fail naive RAG and work
       | in graph, and make sure those work.
       | 
       | Related: We are working on a variant of Graph RAG that solves
       | some additional scale & quality challenges in our data
       | (investigations: threat intel reports, real-time social & news,
       | misinfo, ...), and may be open to an internship or contract role
       | for the right person. One big focus area is ensuring AI quality &
       | AI scale as our version is more GPU/AI-centric and used in
       | serious situations by less technical users... A bit ironic given
       | the article :) LMK if interested, see my profile. We'll need
       | proof of capability for both engineering + AI challenges, and
       | easier for us to teach the latter than the former.
        
       | yetanotherjosh wrote:
       | It seems to me that the "knowledge graph" generated in this
       | article is incredibly naive and not comparable to the process in
       | the MS paper, which requires multiple rounds of preprocessing the
       | source content using LLMs to extract, summarize, find
       | relationships at multiple levels and model them in the graph
       | store. This just splats chunks and words into a vector graph and
       | is barely defensible as a "knowledge graph".
       | 
       | Please tell me I'm missing something because this is egregious.
       | How can you expect a graph approach to improve over naive rag if
       | you don't actually build a knowledge graph that captures high
       | quality, higher level entity relationships?
        
       ___________________________________________________________________
       (page generated 2024-07-10 23:01 UTC)