[HN Gopher] What Every Competent Graph DBMS Should Do
___________________________________________________________________
What Every Competent Graph DBMS Should Do
Author : semihsalihoglu
Score : 46 points
Date : 2023-01-12 19:17 UTC (3 hours ago)
(HTM) web link (kuzudb.com)
(TXT) w3m dump (kuzudb.com)
| mdaniel wrote:
| I've now been conditioned to say "ok, when can we expect the
| Jepsen?" when I see a new database. Although in this case the
| phrase "in-process" (and it apparently being built on top of
| Apache Arrow:
| https://github.com/kuzudb/kuzu/blob/master/external/arrow/ap... )
| may make that nonsense, but the readme does also say
| "Serializable ACID transactions"
| semihsalihoglu wrote:
| I don't quite follow the question here but to clarify one
| thing: Kuzu is currently integrating Arrow not as a core
| storage structure but as a file format from which we can ingest
| data. We are writing Kuzu's entire storage, so it's our own
| design. It has three components: Vanilla columns for node
| properties, columnar compressed sparse row
| (https://tinyurl.com/2r2s4wpe) join indices and relationship
| properties, and a hash index for primary keys of node records.
| We don't use Arrow to store db files.
|
| For serializability: yes, we support serializable transactions.
| So when you insert, delete or update node, rel records, you get
| all or nothing behavior (e.g., if you rollback none of your
| updates will be visible).
|
| That said, supporting ACID transactions is a compeletely
| separate design decision in DBMSs, so our (or other systems')
| mechanisms to support transactions (for example whether it's
| based on write ahead logging or not) and storage designs are
| generally mutually exclusive decisions.
| jeremyjh wrote:
| I don't Jepsen has any relevance to embedded (in-process)
| databases, nor to any single-node databases. The properties its
| testing for are related to network partition tolerance.
| GenerocUsername wrote:
| I made the mistake of learning Neo4j with Cypher before learning
| SQL, and every interaction with SQL since feels like I'm using
| some outdated monster
| blep_ wrote:
| If it makes you feel any better, I don't know Cypher and SQL
| still feels like an outdated monster.
| xkcd99 wrote:
| just one question for the author, why does this system feel more
| like a relational system than a graph database system ? you ask
| users to define schema (which I think neo4j doesn't do) and also
| I think the concept of this rel table which I find in your blog
| is not present in any other graph db
| maxdemarzi wrote:
| The lack of a Schema does hurt Neo4j performance. Properties
| are stored "willy nilly" on a linked list of bytes per
| node/relationship. No order, an "age" property can be: 45,
| 38.5, "adult", [18,19], false... and that makes a terrible mess
| when aggregating, sorting, filtering, searching, etc.
| semihsalihoglu wrote:
| I think it's a mistake for any DBMS to not support a schema. I
| bet this hurts Neo4j a lot and no question that this is a
| mistake. In fact some GDBMS, including Kuzu or TigerGraph
| supports a schema. I think MemGraph does too, though I might be
| wrong. Schema allows systems to do many core computations
| efficiently, most importantly scans of data, which is the
| backbone operation of DBMSs. In fact, because of this every
| semi-structured graph model historically has been extended with
| a schema.
|
| In practice, if you want DBMSs to be performant, you need to
| structure your data. It's one thing to optionally support a
| semi-structured model, which is for example great when building
| initial proof of concepts when you want to develop something
| quickly. It's another thing to not support putting a structure
| on the data, which you'll want when you finally take your
| application to production and will care about performance.
| semihsalihoglu wrote:
| I realized, I forgot to complete my sentence here:
| "...because of this every semi-structured graph model
| historically has been extended with a schema." Examples
| include, XML and RDF. More relevant to this discussion: there
| is an ongoing effort to define a graph schema in GQL, which
| is an industry-academia effort including I think all major
| players: Neo, Tiger, Oracle etc.
| (https://www.gqlstandards.org/home).
|
| You can search for this on the link: "GQL will incorporate
| this prior work, as part of an expanded set of features
| including regular path queries, graph compositional queries
| (enabling views) and schema support."
| CharlieDigital wrote:
| I love seeing Cypher growing.
|
| IMO, the best database query language I've used (various SQL's,
| document DBs, graph DBs).
| pantsforbirds wrote:
| Hard agree Cypher is the easiest way to read complex graph
| operations.
| revskill wrote:
| In the match clause, i don't see the foreign key, how those
| tables get to know each other to join ?
| semihsalihoglu wrote:
| When you insert the Transfer records in my example, you
| indicate that they are "edges/relationships" between "Account"
| node records. The system interprets the values in those records
| implicitly as foreign keys to the Account records. This is what
| I mean by "predefined joins" in my example. When you ingest
| your relationship records, you predefine a join to the system
| and the GDBMS uses those Transfer records to join node records.
|
| Hope this helps.
| xkcd99 wrote:
| in gdbms, there is the concept of "predefined joins" where you
| define a relationship edge that directly connects 2 nodes, i
| don't know how these guys do it, but at least in neo4j that's
| the concept
|
| edit: just saw their blog, they ask you to define a "rel table"
| and that has to define all the joins and they load it from
| there
___________________________________________________________________
(page generated 2023-01-12 23:00 UTC)