[HN Gopher] What Is a Knowledge Graph?
___________________________________________________________________
What Is a Knowledge Graph?
Author : Anon84
Score : 69 points
Date : 2024-08-16 18:22 UTC (4 hours ago)
(HTM) web link (neo4j.com)
(TXT) w3m dump (neo4j.com)
| CharlieDigital wrote:
| I've been working on an implementation of graph RAG (GRAG) using
| Neo4j as the underlying store.
|
| The overall DX is quite nice. The apoc-extended set of plugins[0]
| make it very seamless to work with embeddings and and LLMs during
| local dev/testing. The Graph Data Science package comes preloaded
| with a series of community detection algorithms[1] like Louvain
| and Leiden.
|
| Performance has been very, very good as long as your strategy to
| enter the graph is sound and you've structured your graph in such
| a way that you can meaningfully traverse the adjacent
| properties/nodes.
|
| We've currently deployed the Community edition to AWS ECS Fargate
| using AWS Copilot + EFS as a persistent volume. There were some
| kinks with respect to the docs, but it works great otherwise.
|
| It's worth a look for any teams that are trying to improve their
| RAG or are exploring GRAG in general. It's not a silver bullet;
| you still need to have some "insight" into how to process your
| input data source for the graph to do its magic. But the
| combination of the built-in graph algorithms and the ergonomics
| of Cypher make it possible to perform certain types of queries
| and "explorations" that would otherwise be either harder to
| optimize or more expensive in a relational store.
|
| [0] https://neo4j.com/labs/apoc/5/ml/openai/
|
| [1] https://neo4j.com/docs/graph-data-
| science/current/algorithms...
| riku_iki wrote:
| > Performance has been very
|
| for how many records?
| CharlieDigital wrote:
| During our initial testing, ~1m nodes on a local Docker
| container with 1G RAM and 1vCPU.
|
| But here I mean "performance" in both retrieval time and the
| overall quality of the fragments retrieved for RAG compared
| to a `pgvector` only implementation. It is possible to
| "simulate" these types of graph traversals in pg as well,
| you'll have to work much harder to get the performance (we
| tried it first).
| kmerroll wrote:
| Good article on the high level concepts of a knowledge graph, but
| some concerning mischaracterizations of core functions of
| ontologies supporting the class schema and continued disparaging
| of competing standards-based (RDF triple-store) solutions. That
| the author omits the updates for property annotations using RDF*
| is probably not an accident and glosses over the issues with
| their proprietary clunky query language.
|
| While knowledge graphs are useful in many ways, personally I
| wouldn't use Neo4J to build a knowledge graph as it doesn't
| really play to any of their strengths.
|
| Also, I would rather stab myself with a fork than try to use
| Cypher to query a concept graph when better standards-based
| options are available.
| alexchantavy wrote:
| I enjoy cypher, it's like you draw ASCII art to describe the
| path you want to match on and it gives you what you want. I was
| under the impression that with things like openCypher that
| cypher was becoming (if not was already) the main standard for
| interacting with a graph database (but I could be out of date).
| What are the better standards-based options you're referring
| to?
| throwaway48540 wrote:
| SparQL, rdf triples.
| westurner wrote:
| W3C SPARQL, SPARUL is now SPARQL Update 1.1, SPARQL-star, GQL
|
| GraphQL is a JSON HTTP API schema (2015):
| https://en.wikipedia.org/wiki/GraphQL
|
| GQL (2024):
| https://en.wikipedia.org/wiki/Graph_Query_Language
|
| W3C RDF-star and SPARQL-star (2023 editors' draft):
| https://w3c.github.io/rdf-star/cg-spec/editors_draft.html
|
| SPARQL/Update implementations: https://en.wikipedia.org/wiki/
| SPARUL#SPARQL/Update_implement...
|
| /? graphql sparql [ cypher gremlin ] site:github.com
| inurl:awesome https://www.google.com/search?q=graphql+sparql+
| +site%253Agit...
|
| But then data validation everywhere; so for language-portable
| JSON-LD RDF validation there are many implementations of JSON
| Schema for fixed-shape JSON-LD messages, there's W3C SHACL
| Shapes and Constraints Language, and json-ld-schema is (JSON
| Schema + SHACL)
|
| /? hnlog SHACL, inference, reasoning;
| https://news.ycombinator.com/item?id=38526588
| https://westurner.github.io/hnlog/#comment-38526588
| enragedcacti wrote:
| > That the author omits the updates for property annotations
| using RDF* is probably not an accident and glosses over the
| issues with their proprietary clunky query language.
|
| Not just that, w.r.t. reification they gloss over the fact that
| neo4j has the opposite problem. Unlike RDF it is unable to
| cleanly represent multiple values for the same property and
| requires reification or clunky lists to fix it.
| CharlieDigital wrote:
| > clunky lists
|
| Not sure what the problem is here. The nodes and
| relationships are represented as JSON so it's fairly easy to
| work with them. They also come with a pretty extensive set of
| list functions[0] and operators[1].
|
| Neo4j's UNWIND makes it relatively straightforward to
| manipulate the lists as well[2].
|
| I'm not super familiar with RDF triplestores, but what's nice
| about Neo4j is that it's easy enough to use as a generalized
| database so you can store your knowledge graph right
| alongside of your entities and use it as the primary/only
| database.
|
| [0] https://neo4j.com/docs/cypher-
| manual/current/functions/list/
|
| [1] https://neo4j.com/docs/cypher-
| manual/current/syntax/operator...
|
| [2] https://neo4j.com/docs/cypher-
| manual/current/clauses/unwind/...
| andersonvaz wrote:
| Do you mind in mentioning some of the options available that
| you consider better than Cypher?
| CharlieDigital wrote:
| > While knowledge graphs are useful in many ways, personally I
| wouldn't use Neo4J to build a knowledge graph as it doesn't
| really play to any of their strengths.
|
| I'd strongly disagree. The built-in Graph Data Science package
| has a lot of nice graph algos that are easy to reach for when
| you need things like community detection.
|
| The ability to "land and expand" efficiently (my term for how I
| think about KG's in Neo4j) is quite nice with Cypher. Retrieval
| performance with "land and expand" is, however, highly
| dependent on your initial processing to build the graph and how
| well you've teased out the relationships in the dataset.
| > I would rather stab myself with a fork than try to use Cypher
| to query a concept graph when better standards-based options
| are available.
|
| Cypher is a variant of the GQL standard that was born from
| Cypher itself and subsequently the working group of openCypher:
| https://opencypher.org/
|
| More info:
|
| https://neo4j.com/blog/gql-international-standard/
|
| https://neo4j.com/blog/cypher-gql-world/
| loughnane wrote:
| I've got a django side project that uses neo4j. I use it to map
| out the static content in the domain space and a postgres
| database that handles more transactional stuff.
|
| It works great. I'm not a db expert but the flexibility and
| explicitness of the graph scheme clicks for me. It took me a
| while to come around on cypher but now that I'm there it makes
| sense.
| fjfaase wrote:
| An interesting use of Knowledge Graphs is doing research into
| historic document, such when doing genealogical research or
| researching into some historic event, person or location. In
| those applications, you often have that sources do not have
| direct references (a person name in one document cannot always be
| identified with with 100% certainty) or are contradicting each
| other (one source gives a different date than another). In this
| case another layer is needed. There is some need for attaching a
| source identification, the actual document (scans), an author
| and/or an authority to a source. In case you are extracting
| information from historical documents, it might be needed to
| transcribe the contents and in that case it would be nice to be
| able to mark parts of the text, to quickly verify the source of a
| fact.
|
| I have not yet found an application that combines all those
| functions and I have been considering to build one myself.
| nemo44x wrote:
| What's great about knowledge graphs and property graphs in
| general is once you really get it (and it's not too difficult,
| especially if you come from a CS background) you start to see
| graphs all over the place. It's a really nice way to work with
| data for certain classes of problems. Once you get "enough" data
| in and "enough" of a variety of things connected, you start to
| see remarkable relationships emerge.
| cobertos wrote:
| What sort of relationships have you seen in the data you've
| worked with that you'd describe as remarkable?
|
| I've explained a similar thing to friends before, but I was
| always at a loss for relationships/insights that have led to
| concrete outcomes
| nemo44x wrote:
| Supply chain and logistics. We'd start to see where the
| biggest risk points were and use that to diversify risk and
| also rank failover. We could make predictions about how the
| supply chain would be disrupted based on individual
| suppliers/movers/warehouses/etc having events that affected
| their ability to perform. You start to see how much some
| suppliers rely on each other, etc. Holy hell did Covid make
| that crazy!
| openrisk wrote:
| It will be quite a plot twist if Graph RAG paves the way for
| making knowledge graphs / semantic networks and the like cutting
| edge again... New "AI" meets old "AI" etc.
| burakemir wrote:
| In addition to labelled property graphs and triples, a list of
| approaches to knowledge graph should consider facts(tuples) that
| are connected via common values as a form of graph, with datalog
| queries to query them. This is a lot more flexible than either
| approach IMHO and also more easily connected to existing
| relational data.
|
| RDFox is a tool that uses Datalog internally. RelationalAI uses a
| datalog based approach. Another example is Mangle Datalog, my own
| humble open source project that can be found on GitHub.
|
| The language in the article about relational being "non native
| graph" is a bit biased. With some developer attention, there are
| massive opportunities to store data in a distributed manner and
| with te right indices querying can be fast. Though to be fair,
| good performance will always need developer attention.
| findthewords wrote:
| For me a knowledge graph is a complex network.
|
| When you try to grasp any complex topic your brain starts to
| build and connect a fuzzy network of topics and their respective
| positive or negative correlations and of course the weights
| between the connections.
|
| Once you have unfuzzied the picture in your head you realize that
| the network is active and dynamic and that this network has
| different "modes" of operation and that some weights and
| correlations can change over time, while others are always
| static.
|
| Mastering the dynamics of the knowledge graph is the final step
| in understanding it.
___________________________________________________________________
(page generated 2024-08-16 23:00 UTC)