[HN Gopher] What Every Competent Graph DBMS Should Do
       ___________________________________________________________________
        
       What Every Competent Graph DBMS Should Do
        
       Author : semihsalihoglu
       Score  : 46 points
       Date   : 2023-01-12 19:17 UTC (3 hours ago)
        
 (HTM) web link (kuzudb.com)
 (TXT) w3m dump (kuzudb.com)
        
       | mdaniel wrote:
       | I've now been conditioned to say "ok, when can we expect the
       | Jepsen?" when I see a new database. Although in this case the
       | phrase "in-process" (and it apparently being built on top of
       | Apache Arrow:
       | https://github.com/kuzudb/kuzu/blob/master/external/arrow/ap... )
       | may make that nonsense, but the readme does also say
       | "Serializable ACID transactions"
        
         | semihsalihoglu wrote:
         | I don't quite follow the question here but to clarify one
         | thing: Kuzu is currently integrating Arrow not as a core
         | storage structure but as a file format from which we can ingest
         | data. We are writing Kuzu's entire storage, so it's our own
         | design. It has three components: Vanilla columns for node
         | properties, columnar compressed sparse row
         | (https://tinyurl.com/2r2s4wpe) join indices and relationship
         | properties, and a hash index for primary keys of node records.
         | We don't use Arrow to store db files.
         | 
         | For serializability: yes, we support serializable transactions.
         | So when you insert, delete or update node, rel records, you get
         | all or nothing behavior (e.g., if you rollback none of your
         | updates will be visible).
         | 
         | That said, supporting ACID transactions is a compeletely
         | separate design decision in DBMSs, so our (or other systems')
         | mechanisms to support transactions (for example whether it's
         | based on write ahead logging or not) and storage designs are
         | generally mutually exclusive decisions.
        
         | jeremyjh wrote:
         | I don't Jepsen has any relevance to embedded (in-process)
         | databases, nor to any single-node databases. The properties its
         | testing for are related to network partition tolerance.
        
       | GenerocUsername wrote:
       | I made the mistake of learning Neo4j with Cypher before learning
       | SQL, and every interaction with SQL since feels like I'm using
       | some outdated monster
        
         | blep_ wrote:
         | If it makes you feel any better, I don't know Cypher and SQL
         | still feels like an outdated monster.
        
       | xkcd99 wrote:
       | just one question for the author, why does this system feel more
       | like a relational system than a graph database system ? you ask
       | users to define schema (which I think neo4j doesn't do) and also
       | I think the concept of this rel table which I find in your blog
       | is not present in any other graph db
        
         | maxdemarzi wrote:
         | The lack of a Schema does hurt Neo4j performance. Properties
         | are stored "willy nilly" on a linked list of bytes per
         | node/relationship. No order, an "age" property can be: 45,
         | 38.5, "adult", [18,19], false... and that makes a terrible mess
         | when aggregating, sorting, filtering, searching, etc.
        
         | semihsalihoglu wrote:
         | I think it's a mistake for any DBMS to not support a schema. I
         | bet this hurts Neo4j a lot and no question that this is a
         | mistake. In fact some GDBMS, including Kuzu or TigerGraph
         | supports a schema. I think MemGraph does too, though I might be
         | wrong. Schema allows systems to do many core computations
         | efficiently, most importantly scans of data, which is the
         | backbone operation of DBMSs. In fact, because of this every
         | semi-structured graph model historically has been extended with
         | a schema.
         | 
         | In practice, if you want DBMSs to be performant, you need to
         | structure your data. It's one thing to optionally support a
         | semi-structured model, which is for example great when building
         | initial proof of concepts when you want to develop something
         | quickly. It's another thing to not support putting a structure
         | on the data, which you'll want when you finally take your
         | application to production and will care about performance.
        
           | semihsalihoglu wrote:
           | I realized, I forgot to complete my sentence here:
           | "...because of this every semi-structured graph model
           | historically has been extended with a schema." Examples
           | include, XML and RDF. More relevant to this discussion: there
           | is an ongoing effort to define a graph schema in GQL, which
           | is an industry-academia effort including I think all major
           | players: Neo, Tiger, Oracle etc.
           | (https://www.gqlstandards.org/home).
           | 
           | You can search for this on the link: "GQL will incorporate
           | this prior work, as part of an expanded set of features
           | including regular path queries, graph compositional queries
           | (enabling views) and schema support."
        
       | CharlieDigital wrote:
       | I love seeing Cypher growing.
       | 
       | IMO, the best database query language I've used (various SQL's,
       | document DBs, graph DBs).
        
         | pantsforbirds wrote:
         | Hard agree Cypher is the easiest way to read complex graph
         | operations.
        
       | revskill wrote:
       | In the match clause, i don't see the foreign key, how those
       | tables get to know each other to join ?
        
         | semihsalihoglu wrote:
         | When you insert the Transfer records in my example, you
         | indicate that they are "edges/relationships" between "Account"
         | node records. The system interprets the values in those records
         | implicitly as foreign keys to the Account records. This is what
         | I mean by "predefined joins" in my example. When you ingest
         | your relationship records, you predefine a join to the system
         | and the GDBMS uses those Transfer records to join node records.
         | 
         | Hope this helps.
        
         | xkcd99 wrote:
         | in gdbms, there is the concept of "predefined joins" where you
         | define a relationship edge that directly connects 2 nodes, i
         | don't know how these guys do it, but at least in neo4j that's
         | the concept
         | 
         | edit: just saw their blog, they ask you to define a "rel table"
         | and that has to define all the joins and they load it from
         | there
        
       ___________________________________________________________________
       (page generated 2023-01-12 23:00 UTC)