[HN Gopher] What Is a Knowledge Graph?
       ___________________________________________________________________
        
       What Is a Knowledge Graph?
        
       Author : Anon84
       Score  : 69 points
       Date   : 2024-08-16 18:22 UTC (4 hours ago)
        
 (HTM) web link (neo4j.com)
 (TXT) w3m dump (neo4j.com)
        
       | CharlieDigital wrote:
       | I've been working on an implementation of graph RAG (GRAG) using
       | Neo4j as the underlying store.
       | 
       | The overall DX is quite nice. The apoc-extended set of plugins[0]
       | make it very seamless to work with embeddings and and LLMs during
       | local dev/testing. The Graph Data Science package comes preloaded
       | with a series of community detection algorithms[1] like Louvain
       | and Leiden.
       | 
       | Performance has been very, very good as long as your strategy to
       | enter the graph is sound and you've structured your graph in such
       | a way that you can meaningfully traverse the adjacent
       | properties/nodes.
       | 
       | We've currently deployed the Community edition to AWS ECS Fargate
       | using AWS Copilot + EFS as a persistent volume. There were some
       | kinks with respect to the docs, but it works great otherwise.
       | 
       | It's worth a look for any teams that are trying to improve their
       | RAG or are exploring GRAG in general. It's not a silver bullet;
       | you still need to have some "insight" into how to process your
       | input data source for the graph to do its magic. But the
       | combination of the built-in graph algorithms and the ergonomics
       | of Cypher make it possible to perform certain types of queries
       | and "explorations" that would otherwise be either harder to
       | optimize or more expensive in a relational store.
       | 
       | [0] https://neo4j.com/labs/apoc/5/ml/openai/
       | 
       | [1] https://neo4j.com/docs/graph-data-
       | science/current/algorithms...
        
         | riku_iki wrote:
         | > Performance has been very
         | 
         | for how many records?
        
           | CharlieDigital wrote:
           | During our initial testing, ~1m nodes on a local Docker
           | container with 1G RAM and 1vCPU.
           | 
           | But here I mean "performance" in both retrieval time and the
           | overall quality of the fragments retrieved for RAG compared
           | to a `pgvector` only implementation. It is possible to
           | "simulate" these types of graph traversals in pg as well,
           | you'll have to work much harder to get the performance (we
           | tried it first).
        
       | kmerroll wrote:
       | Good article on the high level concepts of a knowledge graph, but
       | some concerning mischaracterizations of core functions of
       | ontologies supporting the class schema and continued disparaging
       | of competing standards-based (RDF triple-store) solutions. That
       | the author omits the updates for property annotations using RDF*
       | is probably not an accident and glosses over the issues with
       | their proprietary clunky query language.
       | 
       | While knowledge graphs are useful in many ways, personally I
       | wouldn't use Neo4J to build a knowledge graph as it doesn't
       | really play to any of their strengths.
       | 
       | Also, I would rather stab myself with a fork than try to use
       | Cypher to query a concept graph when better standards-based
       | options are available.
        
         | alexchantavy wrote:
         | I enjoy cypher, it's like you draw ASCII art to describe the
         | path you want to match on and it gives you what you want. I was
         | under the impression that with things like openCypher that
         | cypher was becoming (if not was already) the main standard for
         | interacting with a graph database (but I could be out of date).
         | What are the better standards-based options you're referring
         | to?
        
           | throwaway48540 wrote:
           | SparQL, rdf triples.
        
           | westurner wrote:
           | W3C SPARQL, SPARUL is now SPARQL Update 1.1, SPARQL-star, GQL
           | 
           | GraphQL is a JSON HTTP API schema (2015):
           | https://en.wikipedia.org/wiki/GraphQL
           | 
           | GQL (2024):
           | https://en.wikipedia.org/wiki/Graph_Query_Language
           | 
           | W3C RDF-star and SPARQL-star (2023 editors' draft):
           | https://w3c.github.io/rdf-star/cg-spec/editors_draft.html
           | 
           | SPARQL/Update implementations: https://en.wikipedia.org/wiki/
           | SPARUL#SPARQL/Update_implement...
           | 
           | /? graphql sparql [ cypher gremlin ] site:github.com
           | inurl:awesome https://www.google.com/search?q=graphql+sparql+
           | +site%253Agit...
           | 
           | But then data validation everywhere; so for language-portable
           | JSON-LD RDF validation there are many implementations of JSON
           | Schema for fixed-shape JSON-LD messages, there's W3C SHACL
           | Shapes and Constraints Language, and json-ld-schema is (JSON
           | Schema + SHACL)
           | 
           | /? hnlog SHACL, inference, reasoning;
           | https://news.ycombinator.com/item?id=38526588
           | https://westurner.github.io/hnlog/#comment-38526588
        
         | enragedcacti wrote:
         | > That the author omits the updates for property annotations
         | using RDF* is probably not an accident and glosses over the
         | issues with their proprietary clunky query language.
         | 
         | Not just that, w.r.t. reification they gloss over the fact that
         | neo4j has the opposite problem. Unlike RDF it is unable to
         | cleanly represent multiple values for the same property and
         | requires reification or clunky lists to fix it.
        
           | CharlieDigital wrote:
           | > clunky lists
           | 
           | Not sure what the problem is here. The nodes and
           | relationships are represented as JSON so it's fairly easy to
           | work with them. They also come with a pretty extensive set of
           | list functions[0] and operators[1].
           | 
           | Neo4j's UNWIND makes it relatively straightforward to
           | manipulate the lists as well[2].
           | 
           | I'm not super familiar with RDF triplestores, but what's nice
           | about Neo4j is that it's easy enough to use as a generalized
           | database so you can store your knowledge graph right
           | alongside of your entities and use it as the primary/only
           | database.
           | 
           | [0] https://neo4j.com/docs/cypher-
           | manual/current/functions/list/
           | 
           | [1] https://neo4j.com/docs/cypher-
           | manual/current/syntax/operator...
           | 
           | [2] https://neo4j.com/docs/cypher-
           | manual/current/clauses/unwind/...
        
         | andersonvaz wrote:
         | Do you mind in mentioning some of the options available that
         | you consider better than Cypher?
        
         | CharlieDigital wrote:
         | > While knowledge graphs are useful in many ways, personally I
         | wouldn't use Neo4J to build a knowledge graph as it doesn't
         | really play to any of their strengths.
         | 
         | I'd strongly disagree. The built-in Graph Data Science package
         | has a lot of nice graph algos that are easy to reach for when
         | you need things like community detection.
         | 
         | The ability to "land and expand" efficiently (my term for how I
         | think about KG's in Neo4j) is quite nice with Cypher. Retrieval
         | performance with "land and expand" is, however, highly
         | dependent on your initial processing to build the graph and how
         | well you've teased out the relationships in the dataset.
         | > I would rather stab myself with a fork than try to use Cypher
         | to query a concept graph when better standards-based options
         | are available.
         | 
         | Cypher is a variant of the GQL standard that was born from
         | Cypher itself and subsequently the working group of openCypher:
         | https://opencypher.org/
         | 
         | More info:
         | 
         | https://neo4j.com/blog/gql-international-standard/
         | 
         | https://neo4j.com/blog/cypher-gql-world/
        
       | loughnane wrote:
       | I've got a django side project that uses neo4j. I use it to map
       | out the static content in the domain space and a postgres
       | database that handles more transactional stuff.
       | 
       | It works great. I'm not a db expert but the flexibility and
       | explicitness of the graph scheme clicks for me. It took me a
       | while to come around on cypher but now that I'm there it makes
       | sense.
        
       | fjfaase wrote:
       | An interesting use of Knowledge Graphs is doing research into
       | historic document, such when doing genealogical research or
       | researching into some historic event, person or location. In
       | those applications, you often have that sources do not have
       | direct references (a person name in one document cannot always be
       | identified with with 100% certainty) or are contradicting each
       | other (one source gives a different date than another). In this
       | case another layer is needed. There is some need for attaching a
       | source identification, the actual document (scans), an author
       | and/or an authority to a source. In case you are extracting
       | information from historical documents, it might be needed to
       | transcribe the contents and in that case it would be nice to be
       | able to mark parts of the text, to quickly verify the source of a
       | fact.
       | 
       | I have not yet found an application that combines all those
       | functions and I have been considering to build one myself.
        
       | nemo44x wrote:
       | What's great about knowledge graphs and property graphs in
       | general is once you really get it (and it's not too difficult,
       | especially if you come from a CS background) you start to see
       | graphs all over the place. It's a really nice way to work with
       | data for certain classes of problems. Once you get "enough" data
       | in and "enough" of a variety of things connected, you start to
       | see remarkable relationships emerge.
        
         | cobertos wrote:
         | What sort of relationships have you seen in the data you've
         | worked with that you'd describe as remarkable?
         | 
         | I've explained a similar thing to friends before, but I was
         | always at a loss for relationships/insights that have led to
         | concrete outcomes
        
           | nemo44x wrote:
           | Supply chain and logistics. We'd start to see where the
           | biggest risk points were and use that to diversify risk and
           | also rank failover. We could make predictions about how the
           | supply chain would be disrupted based on individual
           | suppliers/movers/warehouses/etc having events that affected
           | their ability to perform. You start to see how much some
           | suppliers rely on each other, etc. Holy hell did Covid make
           | that crazy!
        
       | openrisk wrote:
       | It will be quite a plot twist if Graph RAG paves the way for
       | making knowledge graphs / semantic networks and the like cutting
       | edge again... New "AI" meets old "AI" etc.
        
       | burakemir wrote:
       | In addition to labelled property graphs and triples, a list of
       | approaches to knowledge graph should consider facts(tuples) that
       | are connected via common values as a form of graph, with datalog
       | queries to query them. This is a lot more flexible than either
       | approach IMHO and also more easily connected to existing
       | relational data.
       | 
       | RDFox is a tool that uses Datalog internally. RelationalAI uses a
       | datalog based approach. Another example is Mangle Datalog, my own
       | humble open source project that can be found on GitHub.
       | 
       | The language in the article about relational being "non native
       | graph" is a bit biased. With some developer attention, there are
       | massive opportunities to store data in a distributed manner and
       | with te right indices querying can be fast. Though to be fair,
       | good performance will always need developer attention.
        
       | findthewords wrote:
       | For me a knowledge graph is a complex network.
       | 
       | When you try to grasp any complex topic your brain starts to
       | build and connect a fuzzy network of topics and their respective
       | positive or negative correlations and of course the weights
       | between the connections.
       | 
       | Once you have unfuzzied the picture in your head you realize that
       | the network is active and dynamic and that this network has
       | different "modes" of operation and that some weights and
       | correlations can change over time, while others are always
       | static.
       | 
       | Mastering the dynamics of the knowledge graph is the final step
       | in understanding it.
        
       ___________________________________________________________________
       (page generated 2024-08-16 23:00 UTC)