[HN Gopher] Show HN: FastGraphRAG - Better RAG using good old Pa...
       ___________________________________________________________________
        
       Show HN: FastGraphRAG - Better RAG using good old PageRank
        
       Hey there HN! We're Antonio, Luca, and Yuhang, and we're excited to
       introduce Fast GraphRAG, an open-source RAG approach that leverages
       knowledge graphs and the 25 years old PageRank for better
       information retrieval and reasoning.  Building a good RAG pipeline
       these days takes a lot of manual optimizations. Most engineers
       intuitively start from naive RAG: throw everything in a vector
       database and hope that semantic search is powerful enough. This can
       work for use cases where accuracy isn't too important and
       hallucinations are tolerable, but it doesn't work for more
       difficult queries that involve multi-hop reasoning or more advanced
       domain understanding. Also, it's impossible to debug it.  To
       address these limitations, many engineers find themselves adding
       extra layers like agent-based preprocessing, custom embeddings,
       reranking mechanisms, and hybrid search strategies. Much like the
       early days of machine learning when we manually crafted feature
       vectors to squeeze out marginal gains, building an effective RAG
       system often becomes an exercise in crafting engineering "hacks."
       Earlier this year, Microsoft seeded the idea of using Knowledge
       Graphs for RAG and published GraphRAG - i.e. RAG with Knowledge
       Graphs. We believe that there is an incredible potential in this
       idea, but existing implementations are naive in the way they create
       and explore the graph. That's why we developed Fast GraphRAG with a
       new algorithmic approach using good old PageRank.  There are two
       main challenges when building a reliable RAG system:  (1) Data
       Noise: Real-world data is often messy. Customer support tickets,
       chat logs, and other conversational data can include a lot of
       irrelevant information. If you push noisy data into a vector
       database, you're likely to get noisy results.  (2) Domain
       Specialization: For complex use cases, a RAG system must understand
       the domain-specific context. This requires creating representations
       that capture not just the words but the deeper relationships and
       structures within the data.  Our solution builds on these insights
       by incorporating knowledge graphs into the RAG pipeline. Knowledge
       graphs store entities and their relationships, and can help
       structure data in a way that enables more accurate and context-
       aware information retrieval. 12 years ago Google announced the
       knowledge graph we all know about [1]. It was a pioneering move.
       Now we have LLMs, meaning that people can finally do RAG on their
       own data with tools that can be as powerful as Google's original
       idea.  Before we built this, Antonio was at Amazon, while Luca and
       Yuhang were finishing their PhDs at Oxford. We had been thinking
       about this problem for years and we always loved the parallel
       between pagerank and the human memory [2]. We believe that
       searching for memories is incredibly similar to searching the web.
       Here's how it works:  - Entity and Relationship Extraction: Fast
       GraphRAG uses LLMs to extract entities and their relationships from
       your data and stores them in a graph format [3].  - Query
       Processing: When you make a query, Fast GraphRAG starts by finding
       the most relevant entities using vector search, then runs a
       personalized PageRank algorithm to determine the most important
       "memories" or pieces of information related to the query [4].  -
       Incremental Updates: Unlike other graph-based RAG systems, Fast
       GraphRAG natively supports incremental data insertions. This means
       you can continuously add new data without reprocessing the entire
       graph.  - Faster: These design choices make our algorithm faster
       and more affordable to run than other graph-based RAG systems
       because we eliminate the need for communities and clustering.
       Suppose you're analyzing a book and want to focus on character
       interactions, locations, and significant events:
       from fast_graphrag import GraphRAG              DOMAIN = "Analyze
       this story and identify the characters. Focus on how they interact
       with each other, the locations they explore, and their
       relationships."              EXAMPLE_QUERIES = [           "What is
       the significance of Christmas Eve in A Christmas Carol?",
       "How does the setting of Victorian London contribute to the story's
       themes?",           "Describe the chain of events that leads to
       Scrooge's transformation.",           "How does Dickens use the
       different spirits (Past, Present, and Future) to guide Scrooge?",
       "Why does Dickens choose to divide the story into \"staves\" rather
       than chapters?"       ]              ENTITY_TYPES = ["Character",
       "Animal", "Place", "Object", "Activity", "Event"]              grag
       = GraphRAG(           working_dir="./book_example",
       domain=DOMAIN,
       example_queries="\n".join(EXAMPLE_QUERIES),
       entity_types=ENTITY_TYPES       )              with
       open("./book.txt") as f:           grag.insert(f.read())
       print(grag.query("Who is Scrooge?").response)       This code
       creates a domain-specific knowledge graph based on your data,
       example queries, and specified entity types. Then you can query it
       in plain English while it automatically handles all the data
       fetching, entity extractions, co-reference resolutions, memory
       elections, etc. When you add new data, locking and checkpointing is
       handled for you as well.  This is the kind of infrastructure that
       GenAI apps need to handle large-scale real-world data. Our goal is
       to give you this infrastructure so that you can focus on what's
       important: building great apps for your users without having to
       care about manually engineering a retrieval pipeline. In the
       managed service, we also have a suite of UI tools for you to
       explore and debug your knowledge graph.  We have a free hosted
       solution with up to 100 monthly requests. When you're ready to
       grow, we have paid plans that scale with you. And of course you can
       self host our open-source engine.  Give us a spin today at
       https://circlemind.co and see our code at
       https://github.com/circlemind-ai/fast-graphrag  We'd love feedback
       :)  [1] https://blog.google/products/search/introducing-knowledge-
       gr...  [2] Griffiths, T. L., Steyvers, M., & Firl, A. (2007).
       Google and the Mind: Predicting Fluency with PageRank.
       Psychological Science, 18(12), 1069-1076.
       http://www.jstor.org/stable/40064705  [3] Similarly to Microsoft's
       GraphRAG: https://github.com/microsoft/graphrag  [4] Similarly to
       OSU's HippoRAG: https://github.com/OSU-NLP-Group/HippoRAG
       https://vhs.charm.sh/vhs-4fCicgsbsc7UX0pemOcsMp.gif
        
       Author : liukidar
       Score  : 419 points
       Date   : 2024-11-18 17:43 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | dartos wrote:
       | Please tell me I'm not the only one that sees the irony in AI
       | relying on classic search.
       | 
       | Obviously LLMs are good at some semantic understanding of the
       | prompt context and are useful, but the irony is hilarious
        
         | antves wrote:
         | I think the main bit here is that the knowledge graph is
         | entirely constructed by LLMs. It's not just using a pre-
         | existing knowledge graph. It's creating a knowledge graph on
         | the fly based on your data.
         | 
         | Navigating the graph, on the other hand, is the perfect task
         | for PageRank.
        
           | liukidar wrote:
           | Exactly! Also PageRank is used to navigate the graph and find
           | "missing links" between the concepts selected from the query
           | using semantic search via LLMs (so to be able to find
           | information to answer questions that require multi-hop or
           | complex reasoning in one go).
        
           | dartos wrote:
           | Makes perfect sense.
           | 
           | The semantic understanding capabilities fit well for creating
           | knowledge graphs.
        
         | IanCal wrote:
         | I don't get what the irony is here.
        
           | chefandy wrote:
           | Not who you're replying to, but from my vantage point,
           | marketing folks seem to be pushing LLM products as
           | replacements for traditional search products. I think what
           | the post is proposing makes perfect sense from a technical
           | perspective, though. The utility of LLMs will come down to
           | good old-fashioned product design, leveraging existing
           | concepts, and novel technical innovation rather than just
           | dumping quintillions of dollars into increasingly large
           | models and hardware until it does everything for us.
        
             | dartos wrote:
             | Exactly this.
             | 
             | I work in the LLM-augmented search space, so I might be a
             | little too tuned in on this subject.
        
         | mdp2021 wrote:
         | What you should note as quaint is probably more like the
         | integration of more "symbolic" strategies to NNs in AI.
         | 
         | Past the initial sensation, it is pretty linear that "something
         | good at language" (an interface) be integrated with "something
         | good at information retrieval" (the data). (Still sought what
         | comes next, "something to give reliability to processing".)
        
         | mehh wrote:
         | It's not AI, it's a collection technologies and practices
         | within the domain of AI, symbolic and sub symbolic. Arguably
         | classic search is another technology/approach/algorithm with
         | the domain of AI.
        
       | adultSwim wrote:
       | Can this be used with LLMs other than the OpenAI API?
        
         | antves wrote:
         | Yes, it works out-of-the-box with any OpenAI compatible API,
         | including Ollama.
         | 
         | You can check out our example at https://github.com/circlemind-
         | ai/fast-graphrag/blob/main/exa...
        
           | youngNed wrote:
           | it would be nice to see an example that uses ollama - given
           | that ollamas embeddings endpoint is a bit... different, I
           | can't quite figure this out
        
             | liukidar wrote:
             | Hey! Our todo list is a bit swamped with things right now,
             | but we'll try to have a look at that as soon as possible.
             | On the Ollama github I found contrasting information:
             | https://github.com/ollama/ollama/issues/2416 and
             | https://github.com/ollama/ollama/pull/2925 They also
             | suggest to look at this:
             | https://github.com/severian42/GraphRAG-Local-
             | UI/blob/main/em...
             | 
             | Hope this can help!
        
             | antman wrote:
             | litellm has ollama, you could go through that
        
       | deepsquirrelnet wrote:
       | This is cool! How is the graph stored and queried? I'm familiar
       | with graph databases, but I don't see that as a dependency.
       | 
       | Have you tried the sciphi triplex model for extraction? I've
       | tried to do some extraction before, but got inconsistent results
       | if I extracted the chunks multiple times consecutively.
        
         | liukidar wrote:
         | The graph is currently stored using python-igraph. The codebase
         | is designed such that it is easy to integrate any graphdb by
         | writing a light wrapper around it (we will provide support to
         | stuff like neo4j in the near future). We haven't tried triplex
         | since we saw that gpt4o-mini is fast and precise enough for now
         | (and we use it not only for extraction of entities and
         | relationships, but also to get descriptions and resolve
         | conflicts), but for sure with fine tuning results should
         | improve. The graph is queried by finding an initial set of
         | nodes that are relevant to a given query and then running
         | personalized pageranking from those nodes to find other
         | relevant passages. Currently, we select the inital nodes with
         | semantic search both on the whole query and entities extracted
         | from it, but we are planning for other exciting additions to
         | this method :)
        
           | katelatte wrote:
           | Suggestion: check out Memgraph for graph db storage -
           | https://memgraph.com/. I work at Memgraph as DX Engineer so
           | feel free to ping me in case you have questions about it:
           | https://memgraph.com/office-hours
           | 
           | Your solution looks interesting and I would love to hear more
           | about it. I haven't seen that many PageRank-based graph
           | exploration tools.
        
       | stephantul wrote:
       | Just out of interest: why is every python file prefixed with an
       | underscore? I've never seen it before. Is it to avoid collisions
       | with package imports? e.g. "types"
        
         | liukidar wrote:
         | It is to mark the package as private (in the sense that for
         | normal usage you shouldn't need it). We are still writing the
         | documentation on how to customize every little bit of the graph
         | construction and querying pipeline, once that is ready we will
         | expose the right tools (and files) for all of that :) For now
         | just go with `from fast_graphrag import GraphRAG` and you
         | should be good to go :)
        
         | dvaun wrote:
         | It's the standard practice of noting internal/implementation
         | details. Users should stick with the public api exported in
         | __init__.
        
           | stephantul wrote:
           | Interesting, thanks! I've seen it for importable stuff but
           | never for modules.
           | 
           | I guess I'm getting old
        
       | fareesh wrote:
       | What solutions are folks using to solve queries like "How many of
       | these 1000 podcast transcripts have a positive view of Hillary
       | Clinton"? Seems like you would need a way to map reduce and
       | count? And some kind of agent assigner/router on top of it?
        
         | chefandy wrote:
         | I don't have a dev answer, but in case its relevant, I've seen
         | commercial services that I imagine are doing something similar
         | on the back end-- ground news is one of them. I wish they had
         | monthly subs for their top tier plan rather than only annual,
         | but it seems like a cool product. I haven't actually used it
         | though.
        
           | aspenmayer wrote:
           | What feature(s) of the top tier plan do you wish you had? I
           | have no idea how their subs work but have seen a few ads for
           | the product so have a vague idea that it rates news for bias
           | but don't see how that would involve many different tiers of
           | subs.
        
             | chefandy wrote:
             | It's been a while since I looked, but unless they changed
             | it, you needed the top tier plan to get a report analyzing
             | the biases of your reading choices and recommending things
             | to balance it out.
        
               | aspenmayer wrote:
               | Ah, that's an interesting feature. Would they make
               | specific article or outlet recommendations or broad
               | categories of suggestions?
        
               | chefandy wrote:
               | Not sure! I could see that being useful.
        
         | nkristoffersen wrote:
         | We do a lot of things with podcast and other audio media at
         | https://listenalert.com
         | 
         | But in general we found the best course of action is simply
         | label everything. Because our customers will want those answers
         | and rag won't really work at the scale of "all podcasts the
         | last 6 months. What is the trend of sentiment Hillary Clinton
         | and what about the top topics and entities mentioned nearby".
         | So we take a more "brute force" approach :-)
        
         | treprinum wrote:
         | Anticipate what kind of questions user might ask, pre-compute
         | the answers and store them as natural sentences in a vector
         | database.
        
         | avereveard wrote:
         | llm can write graph queries
        
         | antves wrote:
         | At the moment this repo is designed to handle more RAG-oriented
         | use cases, i.e. that require to recall the "top pieces of
         | information" relevant to a given question/context. In your
         | specific example, right now, FastGraphRAG would select the
         | nodes that represent podcasts that are connected to Hilary
         | Clinton, feed them to an LLM which would then select the ones
         | that are positively associated with her. As a next step, we
         | plan to weight the connections between nodes given the query.
         | This way, PageRank will explore only edges which carry the
         | concept "positively associated with", and only the right
         | podcasts would be selected and returned, without having to ask
         | an LLM to classify them. Note that this is basically a fuzzy
         | join and so it will produce only a "best-effort" answer rather
         | than an exact one.
        
       | vasa_ wrote:
       | Neat, we are doing something similar with cognee, but are letting
       | users define graph ingestion, generation, and retrieval
       | themselves instead of making assumptions:
       | https://github.com/topoteretes/cognee
        
       | handfuloflight wrote:
       | Could this be used for context retrieval and generative
       | understanding of codebases?
        
         | antves wrote:
         | Yes, feel free to try it out! You can specialize the graph for
         | the codebase use-case by configuring the graph prompt, entity
         | types, and example questions accordingly.
        
       | zwaps wrote:
       | What is the difference to HippoRAG, which seems to be the same
       | approach but came our earlier?
        
         | antves wrote:
         | HippoRAG is an amazing work and it was a source of inspiration
         | for us as noted in the references. There are a couple of
         | differences:
         | 
         | (1) FastGraphRAG allows the user to make the graph construction
         | opinionated and specialized on a given domain and for a given
         | use-case; this allows to clear out all the noise in the data
         | and yields better results; (2) Unlike HippoRAG, FastGraphRAG
         | initializes PageRank with a mixture of semantic retrieval and
         | entity extractions; (3) HippoRAG is the outcome of an academic
         | paper, and we saw the need for a more robust and production-
         | ready implementation. Our repo is fully typed, includes tests,
         | handles retries with Instructor, uses structured outputs, and
         | so on.
         | 
         | Moving forward, we see our implementation diverge from HippoRAG
         | more radically as we start to introduce new mechanisms such as
         | weighted edges and negative PageRank to model repulsors.
        
       | adamgordonbell wrote:
       | So what is the answer to "Who is Scrooge?" and is it different /
       | better than another approach?
       | 
       | ( Like whole thing in contenxt window for instance? )
       | 
       | Is this approach just for cost savings or does it help get better
       | answers and how so?
       | 
       | Could you share a specific example?
        
         | antves wrote:
         | Generally speaking RAG comes in the game when it is impractical
         | to use large context windows for three reasons: (1) accuracy
         | drops as you stuff the context windows, (2) currently, context
         | windows do not scale past 1M tokens, and (3) even with caching,
         | moving millions of tokens is wasteful and not viable both in
         | terms of costs and latency.
         | 
         | So we should really compare this to other RAG approaches. If we
         | compare it to vector databases RAG, knowledge graphs have the
         | advantage that they model the connections between datapoints.
         | This is super important when asking questions that requires to
         | reason across multiple pieces of information, i.e. multi-hop
         | reasoning.
         | 
         | Also, the graph construction is essentially an exercise in
         | cleaning data to extract the knowledge. Let me give you a
         | practical example. Let's pretend we're indexing customer
         | tickets for creating an AI assistant. If we were to store the
         | data on the tickets as it is, we would overwhelm the vector
         | database with all the noise coming from the conversational
         | nature of this data. With knowledge graphs, we extract only the
         | relevant entities and relationships and store the distilled
         | knowledge in our graph. At query time, we find the answer over
         | a structured data model that contains only clean information
        
           | adamgordonbell wrote:
           | Makes sense, but so can you compare it to to RAG then and
           | show how an answer is superior and what the context contains
           | that makes it superior?
           | 
           | Or how it is close to large context quality of answer with
           | lower cost on some specific examples.
           | 
           | It's helpful when a readme contains a demonstration or as I
           | said above, a specific example.
        
       | AIorNot wrote:
       | This is very cool, I signed up and uploaded a few docs (PDFs) to
       | the dashboard
       | 
       | Our Use case: We have been looking at farming out this work
       | (analyzing complaince documents (manufacturing paperwork) for our
       | AI Startup however we need to understand the potential scale this
       | can operate under and the cost model for it to be useful to us
       | 
       | We will have about 300K PDF documents per client and expect about
       | a 10% change in that document set, month to month -any GraphRag
       | system has to handle documents at scale - we can use S3 as an
       | igestion mechanism but have to understand the cost and processing
       | time needed for the system to be ready to use duiring:
       | 
       | 1. inital loading 2. regular updates -how do we delete data from
       | system for example
       | 
       | cool framework btw..
        
         | antves wrote:
         | Thanks! It sounds like we should be able to help. I'd love to
         | chat more in detail, feel free to send me a note at antonio
         | [at] circlemind.co.
        
       | barlimp wrote:
       | Can this method return references to source documents?
        
         | antves wrote:
         | Yes. We already support this feature in our managed service
         | (https://docs.circlemind.co/essentials/query#include-
         | referenc...) and we'll include it in the next open-source
         | release too!
        
       | larodi wrote:
       | Wonder why this all - here on HN - is not part of the readme .md
       | which says absolutely nothing about how and why this all would
       | work.
       | 
       | The whole approach to representing the work, including the
       | writing here, screams marketing, and the paid offering is the
       | only thing made absolutely clear about it.
       | 
       | p.s. I absolutely understand why a knowledge graph is essential
       | and THE right approach for RAG, and particularly when vector DBS
       | on their own are subpar. But so do know many others and from the
       | way the repo is presented it absolutely gives no clue why yours
       | is _something_ in respect to other/common-sense graph-RAG-
       | somethings.
       | 
       | You see, there are hundreds of smart people out there who can
       | easily come to conclusion data needs to be presented as knowledge
       | in graph-ontological way and then feed the context with only the
       | relevant subgraph. Like, you could've said so much rather than
       | asking .0084 cents or whatever for APIs as the headline of a
       | presumably open repo.
        
         | azinman2 wrote:
         | HN is "for" startups. This is a startup. What's the problem?
        
           | larodi wrote:
           | The problem is neither with HN, the statement was not about
           | HN being or not being about startups (though I would not say
           | personally it is "for startups"), neither it was against
           | startups or other starting groups showing projects. The
           | problem cited is about lack of clarity and supposed change of
           | topic, not the chosen audience.
           | 
           | Now, what is your comment precisely about, cause I'm pretty
           | sure what mine was?
        
         | antves wrote:
         | I completely agree that the README could do a better job
         | explaining the implementation details and our reasoning behind
         | key design choices. For instance, we should elaborate on why we
         | believe using PageRank offers a more effective exploration
         | strategy compared to other GraphRAG approaches.
         | 
         | FastGraphRAG is entirely free to use, even for commercial
         | applications, and we're happy to make it accessible to
         | everyone. The managed service is how we sustain our business.
        
       | peppertree wrote:
       | How does domain and example queries help construct the knowledge
       | graph, or is that just context for executing queries.
        
         | antves wrote:
         | These are knobs that you can tune to make the graph
         | construction more/less opinionated. Generally speaking, the
         | more we make it opinionated the better it fits the task.
         | 
         | At a high-level:
         | 
         | (1) Domain: allows you to "talk to the graph constructor". If
         | you care particularly about one aspect of your data, this is
         | the place to say it. For reference, take a look at some of the
         | example prompts on our website (https://circlemind.co/)
         | 
         | (2) Example Queries: if you know what class of questions users
         | will ask, it'd be useful to give the system this information so
         | that it will "keep these questions in mind" when designing the
         | graph. If you don't know which kinds of questions, you can just
         | put a couple of high-level questions that you think apply to
         | your data.
         | 
         | (3) Entity Types: this has a very high impact on the final
         | quality of the graph. Think of these as the types of entities
         | that you want to extract from your data, e.g. person, place,
         | event, etc
         | 
         | All of the above help construct the knowledge graph so that it
         | is specifically designed for your use-case.
        
       | davidsainez wrote:
       | Cool! But I'm confused on your pricing. The github page says
       | first 100 requests are free but the landing page says to self
       | host if you want to use for free. I signed up and used the
       | dashboard but I don't see a billing section or option to upgrade
       | the account.
        
         | antves wrote:
         | Thanks for trying it out! There are two options to use
         | FastGraphRAG for free:
         | 
         | (1) Self-hosting our open-source package (2) Using the free
         | tier of the managed service, which includes 100 requests
         | 
         | If you wish to upgrade your plan, you can reach out to us at
         | support [at] circlemind.co
        
       | bionhoward wrote:
       | Since when does "good old PageRank" demand an OpenAI API key?
       | 
       | "You may not: use Output to develop models that compete with
       | OpenAI" => they're gonna learn from you and you can't learn from
       | them.
       | 
       | Glad we're all so cool with longterm economic downfall of natural
       | humans. Our grandkids might not be so glad about it!
        
         | liukidar wrote:
         | LLMs are only used to construct the graph, to navigate it we
         | use an algorithmic approach. As of now, what we do is very
         | similar to HippoRAG (https://github.com/OSU-NLP-
         | Group/HippoRAG), their paper can give a good overview on how
         | things are working under the hood!
        
       | anotherpaulg wrote:
       | Super interesting, thanks for sharing. How large a corpus of
       | domain specific text do you need to obtain a useful knowledge
       | graph?
       | 
       | Aider has been doing PageRank on the call graph of code repos
       | since forever. All non trivial code has lots of graph structure
       | to support PageRank. So it works really well to find the most
       | relevant context in the project related to the currently active
       | task.
       | 
       | https://aider.chat/docs/repomap.html#optimizing-the-map
        
         | liukidar wrote:
         | We have tried from small novels to full documentations of some
         | milion tokens and both seem to create interesting graphs, it
         | would be great to hear some feedback as more people start using
         | it :)
        
         | ukuina wrote:
         | I enjoy Aider, but it has never successfully created a repo
         | map, regardless of whether the codebase is Python, JS, or TS.
         | Are there any plans to allow force-creation and inspection of a
         | repo map?
        
       | LASR wrote:
       | So I've done a ton of work in this area.
       | 
       | Few learnings I've collected:
       | 
       | 1. Lexical search with BM25 alone gives you very relevant results
       | if you can do some work during ingestion time with an LLM.
       | 
       | 2. Embeddings work well only when the size of the query is
       | roughly on the same order of what you're actually storing in the
       | embedding store.
       | 
       | 3. Hypothetical answer generation from a query using an LLM, and
       | then using that hypothetical answer to query for embeddings works
       | really well.
       | 
       | So combining all 3 learnings, we landed on a knowledge
       | decomposition and extraction step very similar to yours. But we
       | stick a metaprompter to essentially auto-generate the domain /
       | entity types.
       | 
       | LLMs are naively bad at identifying the correct level of
       | granularity for the decomposed knowledge. One trick we found is
       | to ask the LLM to output a mermaid.js mindmap to hierarchically
       | break down the input into a tree. At the end of that output, ask
       | the LLM to state which level is the appropriate root for a
       | knowledge node.
       | 
       | Then the node is used to generate questions that could be
       | answered from the knowledge contained in this node. We then index
       | the text of these questions and also embed them.
       | 
       | You can directly match the user's query from these questions
       | using purely BM25 and get good outputs. But a hybrid approach
       | works even better, though not by that much.
       | 
       | Not using LLMs are query time also means we can hierarchically
       | walk down the root into deeper and deeper nodes, using the
       | embedding similiarity as a cost function for the traversal.
        
         | sramam wrote:
         | Very interesting. Thank you getting into the details. Do you
         | chunk the text that goes into the BM25 index? For the
         | hypothetical answer, do you also prompt for "chunk size"
         | responses?
        
         | liukidar wrote:
         | Thanks for sharing! These are all very helpful insights! We'll
         | keep this in mind :)
        
         | antves wrote:
         | Thanks for sharing this! It sounds very interesting. We
         | experimented with a similar tree setup some time ago and it was
         | giving good results. We eventually decided to move towards
         | graphs as a general case of trees. I think the notion of using
         | embeddings similarity for "walking" the graph is key, and we're
         | actively integrating it in FastGraphRAG too by weighting the
         | edges by the query. It's very nice to see so many solutions
         | landing on similar designs!
        
         | sdesol wrote:
         | > Hypothetical answer generation from a query using an LLM, and
         | then using that hypothetical answer to query for embeddings
         | works really well.
         | 
         | This is honestly wear I think LLM really shines. This also
         | gives you a very good idea if your documentation is deficient
         | or not.
        
         | yaj54 wrote:
         | > 3. Hypothetical answer generation from a query using an LLM,
         | and then using that hypothetical answer to query for embeddings
         | works really well.
         | 
         | I've been wondering about that and am glad to hear it's working
         | in the wild.
         | 
         | I'm now wondering if using a fine-tuned LLM (on the corpus) to
         | gen the hypothetical answers and then use those for the rag
         | flow would work even better.
        
           | gillesjacobs wrote:
           | The technique of generating hypothetical answers (or
           | documents) from the query was first described in the "HyDE
           | (Hypothetical Document Expansion) paper". [1]
           | 
           | Interestingly, going both ways: generate hypothetical answers
           | for the query, and also generate hypothetical questions for
           | the text chunk at ingestion both increase RAG performance in
           | my experience.
           | 
           | Though LLM-based query-processing is not always suitable for
           | chat applications if inference time is a concer (like near-
           | real time customer support RAG), so ingestion-time
           | hypothetical answer generation is more apt there.
           | 
           | 1. https://aclanthology.org/2023.acl-long.99/
        
           | tweezy wrote:
           | We do this as well with a lot of success. It's cool to see
           | others kinda independently coalescing around this solution.
           | 
           | What we find really effective is at content ingestion time,
           | we prepend "decorator text" to the document or chunk. This
           | incorporates various metadata about the document (title,
           | author(s), publication date, etc).
           | 
           | Then at query time, we generate a contextual hypothetical
           | document that matches the format of the decorator text.
           | 
           | We add hybrid search (BM25 and rerank) to that, also add
           | filters (documents published between these dates, by this
           | author, this type of content, etc). We have an LLM
           | parameterize those filters and use them as part of our
           | retrieval step.
           | 
           | This process works incredibly for end users.
        
         | siquick wrote:
         | > 1. Lexical search with BM25 alone gives you very relevant
         | results if you can do some work during ingestion time with an
         | LLM
         | 
         | Can you expand on what the LLM work here is and it's purpose?
         | 
         | > 3. Hypothetical answer generation from a query using an LLM,
         | and then using that hypothetical answer to query for embeddings
         | works really well.
         | 
         | Interesting idea, going to add to our experiments. Thanks.
        
           | andai wrote:
           | It seems to come down to keyword expansion, though I'd be
           | curious if there's more to it than just asking "please
           | generate relevant keywords".
        
             | sdesol wrote:
             | Something that I'm working on is making it easy to fix
             | spelling and grammatical errors in documents that can
             | affect BM25 and embeddings. So in addition to generating
             | keyword/metadata with LLM, you could also ask it to clean
             | the document; however, based on what I've learned so far,
             | fixing spelling and grammatical errors should involve
             | humans in the process, so you really can't automate this.
        
               | firejake308 wrote:
               | > fixing spelling and grammatical errors should involve
               | humans in the process, so you really can't automate this
               | 
               | This is an interesting observation to me. I would have
               | expected that, since LLMs evolved from
               | autocomplete/autocorrect algorithms, correcting spelling
               | mistakes would be one of their strong suits. Do you have
               | examples of cases where they fail?
        
               | sdesol wrote:
               | If you look at my post history, you can see an example of
               | how claude and openai can not tell that GitHub is spelled
               | correctly. The end result won't make a difference but it
               | raises questions regarding how else it can misinterpret
               | things.
               | 
               | At this moment I would not trust AI to automatically make
               | changes.
        
               | spdustin wrote:
               | My answer to this in my own pet project is to mask terms
               | found by the NER pipeline from being corrected, replacing
               | them with their entity type as a special token (e.g.
               | [male person] or [commercial entity]). That alone
               | dramatically improved grammar/spelling correction,
               | especially because the grammatical "gist" of those masked
               | words is preserved in the text presented to the LLM for
               | "correction".
        
               | andai wrote:
               | Fascinating. I think the process could be automated,
               | though I don't know if it's been invented yet. You would
               | want to use the existing autocomplete tech (probabilistic
               | models based on Levenshtein distance and letter proximity
               | on keyboard?) in combination with actually understanding
               | the context of the article and using that to select the
               | right correction. Actually, it sounds fairly trivial to
               | slap those two together, and the 2nd half sounds like
               | something a humble BERT could handle? (I've heard people
               | getting great results with BERTs in current year, though
               | they usually fine-tune them on their particular domain.)
               | 
               | I actually think even BERT could be overkill here -- I
               | have a half-baked prototype of a keyword expansion system
               | that should do the trick here. The idea is is to
               | construct a data structure of keywords ahead of time
               | (e.g. by data-mining some portion of Common Crawl), where
               | each keyword has "neighbors" -- words that often appear
               | together and (sometimes, but not always) signal
               | relatedness. I didn't take the concept very far yet, but
               | I give it better than even odds! (Especially if the
               | resulting data structure is pruned by a half-decent LLM
               | -- my initial attempts resulted in a lot of questionable
               | "neighbors" -- though I had a fairly small dataset so
               | it's likely I was largely looking at noise.)
        
               | sdesol wrote:
               | > I think the process could be automated
               | 
               | It can definitely be automated in my opinion, if you go
               | with a supermajority workflow. Something that I've
               | noticed with LLMs is it's very unlikely for all high-
               | quality LLM models to be wrong at the same time. So if
               | you go by a supermajority, the changes are almost
               | certainly valid.
               | 
               | Having said all of that, I still believe we are not
               | addressing the root cause of bad searches which is
               | "garbage in, garbage out". I strongly believe the true
               | calling for LLM will be to help us curate and manage
               | data, at scale.
        
         | mhuffman wrote:
         | My experience matches your's, but related to
         | 
         | >3. Hypothetical answer generation from a query using an LLM,
         | and then using that hypothetical answer to query for embeddings
         | works really well.
         | 
         | What sort of performance are you getting in production with
         | this one? The other two are basically solved for performance
         | and RAG in general if it is related to a known and pre-
         | processed corpus but I am having trouble thinking of how you
         | don't get a hit with #3.
        
           | LASR wrote:
           | It's slow. So we use hypothetical mostly for async
           | experiences.
           | 
           | For live experiences like chat, we solved it with UX. As soon
           | as you start typing the words of a question into the chat
           | box, it does the FTS search and retrieves a set of documents
           | that have word-matches, scored just using ES heuristics (eg:
           | counting matching words etc)
           | 
           | These are presented as cards that expand when clicked. The
           | user can see it's doing _something_.
           | 
           | While that's happening, also issue a full hyde flow in the
           | background with a placeholder loading shimmer that loads in
           | the full answer.
           | 
           | So there is some dead-time of about 10 seconds or so while it
           | generates the hypothetical answers. After that, a short ~1
           | sec interval to load up the knowledge nodes, and then it
           | starts streaming the answer.
           | 
           | This approach tested well with UXR participants and maintains
           | acceptable accuracy.
           | 
           | A lot of the times, when looking for specific facts from a
           | knowledge base, just the card UX gets an answer immediately.
           | Eg: "What's the email for product support?"
        
         | isoprophlex wrote:
         | > LLMs are naively bad at identifying the correct level of
         | granularity for the decomposed knowledge. One trick we found is
         | to ask the LLM to output a mermaid.js mindmap to hierarchically
         | break down the input into a tree. At the end of that output,
         | ask the LLM to state which level is the appropriate root for a
         | knowledge node. > Then the node is used to generate questions
         | that could be answered from the knowledge contained in this
         | node. We then index the text of these questions and also embed
         | them.
         | 
         | Ha, that's brilliant. Thanks for sharing this!
        
         | katelatte wrote:
         | I organize community calls for Memgraph community and recently
         | a community member presented how he uses hypothetical answer
         | generation as a crucial component to enhancing the
         | effectiveness and reliability of the system, allowing for more
         | accurate and contextually appropriate responses to user
         | queries. Here's more about it:
         | https://memgraph.com/blog/precina-health-memgraph-graphrag-t...
        
       | michelpp wrote:
       | PageRank is one of several interesting centrality metrics that
       | could be applied to a graph to influence RAG on structural data,
       | another one is Triangle Centrality which counts triangles around
       | nodes to figure out their centrality based on the concept that
       | triangles close relationships into a strong bond, where open
       | bonds dilute centrality by drawing weight away from the center:
       | 
       | https://arxiv.org/abs/2105.00110
       | 
       | The paper shows high efficiency compared to other centralities
       | like PageRank, however in some research using the GraphBLAS I and
       | my coauthors found that TC was slower on a variety of sparse
       | graphs than our sparse formulation of PR for graphs up to 1.8
       | billion edges, but that TC appears to scale better as graphs get
       | larger and is likely more efficient in the trillion edge realm.
       | 
       | https://fossies.org/linux/SuiteSparse/GraphBLAS/Doc/The_Grap...
        
         | liukidar wrote:
         | This is super interesting! Thanks for sharing. Here we are
         | talking of graphs in the milions nodes/edges, so efficiency is
         | not that big of a deal, since anyway things are gonna be parsed
         | by a LLM to craft an asnwer which will always be the
         | bottleneck. Indeed PageRank is the first step, but we would be
         | happy to test more accurate alternatives. Importantly, we are
         | using personalized pagerank here, meaning we give specific
         | intial weights to a set (potentially quite large) of nodes,
         | would TC support that (as well as giving weight to edges, since
         | we are also looking into that)?
        
           | michelpp wrote:
           | > Here we are talking of graphs in the milions nodes/edges,
           | 
           | That ought to be enough for anybody.
           | 
           | > would TC support that
           | 
           | TC is a purely structural algorithm, it counts triangles so
           | it doesn't take any weights into consideration, but it does
           | return a vector of normalized ranking from 0.0 to 1.0, which
           | you could combine with an existing biasing strategy to boost
           | results that have strong centrality.
        
             | lmeyerov wrote:
             | Hah indeed, we are doing billion-scale real-time graph rag
             | in louie.ai for fairly regular tasks, so your sentiment
             | resonates ;-)
             | 
             | For something like uploading a big folder of documents,
             | agree with the OP, pretty straightforward, naive in-memory
             | with out-of-the-box embeddings, LLMs, retrieval, and
             | untuned DBs goes far. I expect most vector-supporting dbaas
             | and LLMaaS to be offering in the new year. OpenAI, Claude,
             | and friends are already going in this direction, leaving
             | the rag techniques opaque for now.
             | 
             | (Something folks may not appreciate, and I think is
             | important about what's being done here, is the incremental
             | update aspect.)
        
         | arkokoley wrote:
         | Have you tried Authority Rank as a substitute for PageRank?
         | https://link.springer.com/content/pdf/10.1007/978-3-030-6097...
        
       | ignomaniac wrote:
       | does FastRAG integrate with other graph databases like neo4J ?
        
         | liukidar wrote:
         | We are building connectors for that, so it will soon :) At the
         | moment we are using python-igraph (which does everything
         | locally) as we wanted to offer something as ready to use as
         | possible.
        
           | ignomaniac wrote:
           | I'd like to partner to see if a connector to a graph db can
           | be mutually beneficial and provide some value to users. How
           | do I reach out ? NOTE: Im not from Neo4j
        
             | liukidar wrote:
             | That would be awesome, we have a discord you can join and
             | we can talk there (link is in the github repo, message
             | Antonio) or you can message antonio [at] circlemind.com
        
               | onel wrote:
               | Note, the domain is circlemind.co
        
       | dantodor wrote:
       | It looks awfully similar to nano graphrag, but I fail to see any
       | credits to it.
        
       | jillesvangurp wrote:
       | Cool idea. IMHO traditional information retrieval is the way to
       | go with RAG. Vector search is nice but also slow and expensive
       | and people seem to use it as magic pixie dust. It works nice for
       | unstructured data but not necessarily that well for structured
       | data.
       | 
       | And unless tuned very well, vector search is not actually a whole
       | lot better than a good old well tuned query. Putting everything
       | together, the practice of turning structured data into
       | unstructured data just so you can do vector search or prompt
       | engineering on it, which I've seen teams do, feels a bit
       | backwards. It kind of works but there are probably smarter ways
       | to get the same results. Graph RAG is essentially about making
       | use of structure of data. Whether that's through SQL joins or by
       | querying some graph database doesn't really matter much.
       | 
       | There is probably some value into teaching LLMs how to query as
       | well; or letting them interface with existing search/query APIs.
       | And you can compensate for poor ranking with larger context sizes
       | and simply fetch a few hundred or even more results with multiple
       | queries. It's going to be a lot faster and cheaper than vector
       | search to scale that.
        
       | robrenaud wrote:
       | Do you fear that some big company will just host your system for
       | cheaper than you can if you catch a lot of success?
       | 
       | That is, the same thing that Amazon did to Mongo will happen to
       | you?
       | 
       | Do you think working in the open enables you to spend more time
       | on engineering and less on sales and marketing?
        
       | fudged71 wrote:
       | How is it at answering broad questions? Communities and
       | clustering were specifically for that purpose of agglomerating,
       | right?
        
       | yccheok wrote:
       | Hi,
       | 
       | I'm currently building a Q&A chatbot and facing challenges in
       | addressing the following scenario:
       | 
       | When a user asks:
       | 
       | "What do you mean in your previous statement?"
       | 
       | How does your framework handle retrieving the correct small
       | subset of "raw knowledge" and integrating it into the LLM for a
       | relevant response?
       | 
       | Without relying on external frameworks, I've struggled with this
       | issue -
       | https://www.reddit.com/r/LocalLLaMA/comments/1gtzdid/d_optim...
       | 
       | I'd love to know how your framework solves this and whether it
       | can streamline the process.
       | 
       | Thank you!
        
         | martinkallstrom wrote:
         | Have you tried allowing the LLM to decide the use of knowledge
         | retrieval, through tool use or a direct query?
        
         | Tsarp wrote:
         | After a lot of experimentation, the only thing that worked in a
         | chat style application is to pass maybe the last 4-5 messages
         | (ideally the entire conversation history) and ask an LLM to
         | summarize the question in the context of the conversation.
         | 
         | Without that it often failed when users asked something like
         | ("Can you expand point 2? , Give a detailed example of the
         | above").
         | 
         | Current implementation(I have 3 indexes) is to provide Query +
         | Past messages and ask an LLM to break it down into Overall ask:
         | BM25 optimized question: Keywords: Semantic optimized question:
         | 
         | Perform RAG + Rerank and pass the top N passages after this
         | along with the Overall ask in the second LLM call.
        
       | krawczstef wrote:
       | Looks great. But being burned by other "abstractions", e.g.
       | LangChain, I'm weary of the oversimplification. How are you not
       | going to make those same mistakes?
        
       | lordofgibbons wrote:
       | Thanks for releasing this! Have you gotten a chance to run any
       | multi-hop retrieval benchmarks?
       | 
       | It would be very useful to be able to compare this method to
       | other establishes RAG techniques
        
       | pgt wrote:
       | What is RAG, please?
        
         | Loic wrote:
         | Retrieval Augmented Generation
        
       | 2dvisio wrote:
       | Looking forward to someone adapting this for Obsidian and other
       | similar tools. As a low-effort user of Obsidian I would love to
       | reap the benefits of appropriate knowledge graphs, but don't want
       | to put that much effort into creating one myself.
        
       | gillesjacobs wrote:
       | Do you have any retrieval and generation metric scores (eg, KILT
       | or NQ datasets)?
       | 
       | I know benchmark datasets are not the be-all-end-all, but a
       | halfway decent score and inference-time, would really help sell
       | your framework (or help engineers make the choice).
       | 
       | In any case, very cool work, I built a lot of RAG pipelines as
       | freelance NLP engineer and I will try this out.
        
       | ah27182 wrote:
       | So I went ahead and tried running the example script with "A
       | CHRISTMAS CAROL" using the "meta-llama-3.1-8b-instruct" and
       | "text-embedding-nomic-embed-text-v1.5" models locally. How long
       | should it take to extract the subgraphs with this kind of setup?
        
       | redwood wrote:
       | I'm curious how the implementation compares with LightRAG
       | (https://github.com/HKUDS/LightRAG) ?
        
       | andai wrote:
       | I might be the wrong target audience (I do have a great interest
       | in this, but I am not doing it at a professional level) but I
       | feel the GitHub could explain things a bit better -- now I need
       | to go read someone else's research paper to see what you guys are
       | doing!
       | 
       | (Also readme says see examples folder but it's basically empty?)
        
       | inboulder wrote:
       | PageRank for better centrality seems neat, but it still doesn't
       | address the probably unsolvable flaw with RAG, the reason why RAG
       | basically can't work. All RAG DBs under-perform expectations
       | because RAG fundamentally can't find relationships between words
       | necessary to find the information the user cares about. Weird
       | right, isn't this what the 'attention' mechanism is supposed to
       | be good for? It just isn't good enough.
       | 
       | Example: Say you're searching an article and you want to know
       | what occupation a mentioned person has, let's say the person
       | 'Sharon,' is mentioned to have attended several physical
       | chemistry conferences but her occupation is never explicitly
       | mentioned. There's a very good chance every single rag approach
       | will fail to return correct results, will fail to make this
       | connection between 'occupation' attends conference, type of
       | conference and infers 'chemist'. I could go on, but this sort of
       | error is pervasive along all types of information when trying to
       | retrieve with RAG. In the end, solutions like the above seem to
       | just sort of reinvent other query methods, SQL, pagerank etc,
       | with extra steps... there's little point in vectorization at that
       | point...
        
         | samsonradu wrote:
         | Isn't this inference an LLM's job? The RAG component just needs
         | to find the Sharon article among a large dataset and pass it
         | (entirely) to the LLM as context.
        
         | jsenn wrote:
         | On the contrary, examples like yours are the entire point of
         | approaches like this one. If you read the HippoRAG paper cited
         | by OP, their motivating example is almost identical to yours,
         | and their evaluations are largely on multi-hop question
         | answering of this kind.
        
         | queueueue wrote:
         | I don't see how this is not possible using knowledge graphs?
         | You retrieve the entity, Sharon, and the additional context you
         | get will be the nodes and edges close to Sharon. After this it
         | becomes the LLM's job because if it is not mentioned in the
         | given context, it should let the prompter know "In the given
         | context the occupation of Sharon could not be found".
        
       | ravishar313 wrote:
       | Is there a way to use this just as a retriever?
        
       | staticautomatic wrote:
       | Very cool. Have you considered whether incorporating any of the
       | new-ish unsupervised or semi-supervised keyphrase extraction
       | algorithms could give this a boost? Teket (graph-based) and
       | sifrank come to mind, but I'm also wondering if AutoPhrase + an
       | LLM could be powerful.
        
       | splitrocket wrote:
       | is there any sense of tenancy?
       | 
       | From what I can tell, at least given the examples is that there
       | is one global graph.
       | 
       | Thanks!
        
       | omgCPhuture wrote:
       | Gosh I miss the days Google were using pagerank and not whatever
       | the heck kind of crap their service has turned into.
        
       ___________________________________________________________________
       (page generated 2024-11-19 23:00 UTC)