[HN Gopher] Knowledge Graphs in RAG: Hype vs. Ragas Analysis
___________________________________________________________________
Knowledge Graphs in RAG: Hype vs. Ragas Analysis
Author : rooftopzen
Score : 117 points
Date : 2024-07-09 21:00 UTC (1 days ago)
(HTM) web link (aiencoder.substack.com)
(TXT) w3m dump (aiencoder.substack.com)
| jimmySixDOF wrote:
| This is a nice sandbox walkthrough of the author's objective
| which was to test MSFT claims in the paper -- but with all due
| respect the buzz of graphs is because they add whole third layer
| in a combined approach like Reciprocal Rank Fusion (RRF). You do
| a BM25 search then you do a vector based nearest neighbors search
| and now you can add a KG search then all combined with local and
| global reranking etc the expectation is this produces a better
| final outcome. These findings aside, it still makes sense that
| adding KG to a hybrid search pipeline is going to be useful.
| itkovian_ wrote:
| Knowledge graphs where created to solve the problem of making
| natural,free flowing text machine processable. We now have a
| technology that completely understands natural free flowing text
| and can extract meaning. Why would going back to structure help
| when that structure can never be as rich as just text. I get it
| if the kb has new information, that's not what I'm saying.
| esafak wrote:
| But RAG without graphs just relies on similarity search, which
| isn't very smart.
| visarga wrote:
| > Why would going back to structure help
|
| When your corpus is large it is useful to split it up and
| hierarchically combine. In their place I would do both bottom-
| up and top-down summarization passes, so information can
| percolate from a leaf to the root and from the root to a
| different leaf. Global context can illuminate local summaries,
| for example think of the twist in a novel, it sheds new light
| on everything.
| itkovian_ wrote:
| That's not what a kb is
| th0ma5 wrote:
| > We now have a technology that completely understands natural
| free flowing text and can extract meaning.
|
| Actually we don't. I know it certainly feels like LLMs do this
| but no one would dare stake their life on their output if they
| know how they work. Still useful!
| visarga wrote:
| The Microsoft GraphRAG paper focuses on global sensemaking
| through hierarchical summarization, which is a fundamental aspect
| of their approach. The blog post analysis, however, doesn't
| address this core feature at all. Another issue is the corpus
| size, the paper focuses on sizes on the order of 1M tokens, while
| the reference text used in the blog post is probably shorter. On
| shorter text a simple LLM call could do summarization directly.
| piizei wrote:
| Looks like the test-setup confuses knowledge graphs with graph
| databases. The code just creates a neo4j database from a
| document, not a knowledge graph (basically uses neo4j as vector
| database). A knowledge graph would be created by a LLM as a
| preprocessing step (and queried similary by an LLM). This is a
| different approach than was tested, an approach that trades
| preprocessing time and domain knowledge for accuracy. Reference:
| https://python.langchain.com/v0.1/docs/use_cases/graph/const...
| rcarmo wrote:
| Yeah, I think the dataset is flawed. GraphRAG appears to be
| aimed at navigating the Microsoft 365 document and people graph
| that you get in an organization setting, not doing a homogenous
| search.
| qeternity wrote:
| I don't believe the author read the GraphRAG paper as there is
| nothing in this "deep dive" that implements anything remotely
| close.
| dmezzetti wrote:
| There is no one size fits all formula. For simple RAG, a search
| query (vector, keyword, SQL, etc) works to build a context.
|
| For more complex questions or research, a knowledge graph can be
| beneficial. I wrote an article[1] earlier this year that used
| graph path traversal to build a context.
|
| The goal was to build a short narrative about English history
| from 500 - 1000 using Wikipedia articles. Vector similarity alone
| won't bring back good results. This article used a cypher graph
| path query that jumped multiple hops through concepts of
| interest. Those articles on that path were then brought in as the
| context.
|
| [1] https://neuml.hashnode.dev/advanced-rag-with-graph-path-
| trav...
| davedx wrote:
| This seems highly relevant: https://arxiv.org/abs/2406.01506
|
| > In this paper, we study the two foundational questions in this
| area. First, how are categorical concepts, such as {'mammal',
| 'bird', 'reptile', 'fish'}, represented? Second, how are
| hierarchical relations between concepts encoded? For example, how
| is the fact that 'dog' is a kind of 'mammal' encoded? We show how
| to extend the linear representation hypothesis to answer these
| questions. We find a remarkably simple structure: simple
| categorical concepts are represented as simplices, hierarchically
| related concepts are orthogonal in a sense we make precise, and
| (in consequence) complex concepts are represented as polytopes
| constructed from direct sums of simplices, reflecting the
| hierarchical structure.
|
| Basically, LLM's _already partially encode information as
| semantic graphs internally_.
|
| With this it is less surprising that augmenting them with
| external knowledge graphs has a lower ROI.
| CharlieDigital wrote:
| > Basically, LLM's already partially encode information as
| semantic graphs internally.
|
| There's an (underutilized?) technique here to take advantage of
| that internal graph: have the LLM tell you the related concepts
| first and then perform the RAG using not just the original
| concept, but the expanded set of related concepts.
|
| So: concept - [related concepts] - [[.. rag-
| rc1],[.. rag-rc2],[.. rag-rcn]] - summarize
|
| With GPTs prior to 4o, it would have been too slow to do this
| as a two-step process. With 4o and some of the higher
| throughput Llama3 based options (Together.ai, Fireworks.ai,
| Groq.com), a two-step fan-out RAG approach takes advantage of
| this internal graph and could probably yield similar gains in
| RAG without additional infrastructure (another datastore) nor
| data pre-processing to take advantage of a graph approach.
| davedx wrote:
| Yup. Fascinating stuff really. Kind of like mnemonics for
| LLM's, if you squint a bit.
| neeleshs wrote:
| Even with old GPT, if the summary is decent, it works
| reasonably well with even no RAG. We are a data management
| platform and allow users to build data pipelines around a
| data model. This is basically a DAG. We autogenerate
| documentation for these pipelines, using gpt4, and feed a
| summarized version of the data pipeline - expressed as
| graphviz dot file format in the prompt. gpt4 understands this
| format well, and seeminlgy understands the graph itself
| reasonably well!
|
| It performs poorly expressing the higher level intent of the
| pipeline, but tactical details are accurately documented. We
| are trying to push prompting itself more, before turning to
| RAG & finetuning
| mkehrt wrote:
| As an GenAI skeptic, I think this is a very cool finding. My
| experience with AI tools is that they are complete bullshit
| artists. But to a large extent that's just a result of the way
| they are trained. If this description of how the data is
| structured is correct, it indicates that these programs do
| encode a real model about the world. Perhaps alternative ways
| of training these same models, or fixing the data afterwards,
| will result in more truthful models.
| mark_l_watson wrote:
| That is an interesting writeup, but I had trouble understanding
| what they meant by what for me is a new term: "faithfulness."
|
| This is supposedly a measure of reducing hallucinations. Is it
| just me, or did other people here have difficulty understanding
| how faithfulness was evaluated?
|
| EDIT: OK, faithfulness is calculated by human evaluation, and can
| be automatically calculated with ROUGE and BLEU.
| Tostino wrote:
| I really need to dig into the more recent advances in knowledge
| graphs + LLMs. I've been out of the game for ~10 months now, and
| am just starting to dig back into things and get my training
| pipeline working (darn bitrot...)
|
| I had previously trained a llama2 13b model
| (https://huggingface.co/Tostino/Inkbot-13B-8k-0.2) on a whole
| bunch of knowledge graph tasks (in addition to a number of other
| tasks).
|
| Here is an example of the training data for training it how to
| use knowledge graphs:
|
| easy -
| https://gist.github.com/Tostino/76c55bdeb1f099fb2bfab00ce144...
|
| medium -
| https://gist.github.com/Tostino/0460c18024697efc2ac34fe86ecd...
|
| I also trained it on generating KGs from conversations, or
| articles you have provided. So from the LLM side, it's way more
| knowledgeable about the connections in the graph than GPT4 is by
| default.
|
| Here are a couple examples of the trained model actually
| generating a knowledge graph:
|
| 1.
| https://gist.github.com/Tostino/c3541f3a01d420e771f66c62014e...
|
| 2.
| https://gist.github.com/Tostino/44bbc6a6321df5df23ba5b400a01...
|
| I haven't done any work on integrating those into larger
| structures, combining the graphs generated from different
| documents, or using a graph database to augment my use case...all
| things I am eager to try out, and I am glad there is a bunch more
| to read on the topic available now.
|
| Anyways, near term plans are to train a llama3 8b, and likely a
| phi-3 13b version of Inkbot on an improved version of my dataset.
| Glad to see others as excited as was on this topic!
| lmeyerov wrote:
| I'm happy to see third-party comparisons, most of the marketing
| here indeed just assumes KGs are better with zero proof:
| marketers to be wary of. Unfortunately, I suspect a few key steps
| need to happen for this post to fairly reflect what the Microsoft
| NLP researchers called their alg, vs the broader family named by
| neo4j. Afaict, they're talking about a different graph.
|
| * The kg index should be text documents hierarchically summarized
| based on an extracted named-entity-relation graph. The blog
| version seems to instead do (document, word), not the KG, and
| afaict, skips the hierarchical NER community summarization. The
| blog post is doing what neo4j calls a lexical graph, not the
| novel KG summary index of the MSR paper.
|
| * The data volume should go up. Think a corpus like 100k+ tweets
| or 100+ documents. You start to see challenges like redundant
| tweets that clog retrieval/ranking, or many pieces of the puzzle
| spread over disparate chunks with indirect 'multi-hop' reasoning.
| Something like a debate can fit into one ChatGPT call, with no
| RAG. It's an interesting question how summarization preprocessing
| can still help small documents, but a more nuanced topic (and we
| have Thoughts on ;-))
|
| * The tasks should reflect the challenges: multi-hop reasoning,
| wider summarization with fixed budget, etc. Retesting simple
| queries naive RAG already solves isn't the point. The paper
| focused on a couple types, which is also why they route to 2 diff
| retrieval modes. Subtle, part of the challenge in bigger data is
| how many resources we give the retriever & reasoner, and part of
| why graph rag is exciting IMO.
|
| Afaict the blogpost essentially did a lexical graph with
| chunk/node embeddings, reran on a small document, and at that
| scale, asked simple q's... So close to a naive retrieval, and
| unsurprisingly, got parity. It's not too much more to improve so
| would encourage doing a bit more. Beyond the MSR paper, I would
| also experiment a bit more with retrieval strategies, eg, agentic
| layer on top, and include simple text search mixed in with
| reranking. And as validation before any of that, focus
| specifically on the queries expected to fail naive RAG and work
| in graph, and make sure those work.
|
| Related: We are working on a variant of Graph RAG that solves
| some additional scale & quality challenges in our data
| (investigations: threat intel reports, real-time social & news,
| misinfo, ...), and may be open to an internship or contract role
| for the right person. One big focus area is ensuring AI quality &
| AI scale as our version is more GPU/AI-centric and used in
| serious situations by less technical users... A bit ironic given
| the article :) LMK if interested, see my profile. We'll need
| proof of capability for both engineering + AI challenges, and
| easier for us to teach the latter than the former.
| yetanotherjosh wrote:
| It seems to me that the "knowledge graph" generated in this
| article is incredibly naive and not comparable to the process in
| the MS paper, which requires multiple rounds of preprocessing the
| source content using LLMs to extract, summarize, find
| relationships at multiple levels and model them in the graph
| store. This just splats chunks and words into a vector graph and
| is barely defensible as a "knowledge graph".
|
| Please tell me I'm missing something because this is egregious.
| How can you expect a graph approach to improve over naive rag if
| you don't actually build a knowledge graph that captures high
| quality, higher level entity relationships?
___________________________________________________________________
(page generated 2024-07-10 23:01 UTC)