[HN Gopher] The hunt for the missing data type
       ___________________________________________________________________
        
       The hunt for the missing data type
        
       Author : todsacerdoti
       Score  : 406 points
       Date   : 2024-03-04 16:42 UTC (6 hours ago)
        
 (HTM) web link (www.hillelwayne.com)
 (TXT) w3m dump (www.hillelwayne.com)
        
       | pizlonator wrote:
       | I would claim that object oriented languages are a syntax,
       | semantics and type system for graphs.
       | 
       | Objects are nodes.
       | 
       | Fields are edges.
       | 
       | The object graph is the heap.
       | 
       | So your whole program state is a graph.
        
         | gwbas1c wrote:
         | I was about to post "Data structures are graphs," but your
         | explanation says it better.
        
         | reaperman wrote:
         | I think this is a useful model for me personally, and don't
         | want to diminish its potential value to others; I often think
         | of programs as graphs.
         | 
         | I think it's interesting to add to the discussion that I'm wary
         | to reduce anything to any particular "Turing-complete concept".
         | Because anything can be represented by anything.
        
         | qsort wrote:
         | Graphs are such a general concept that if you squint everything
         | is a graph. Your example works just as well with structs and
         | member fields, we don't even need the OO hypothesis.
        
           | pizlonator wrote:
           | C and structs let you write code using whatever paradigm you
           | want. So you can do object graphs. And you can also do
           | closures. And you can do many other things.
           | 
           | Lots of things are possible when you just treat memory as
           | bytes and pointers are just integers.
        
         | chubot wrote:
         | Right, I think that's just what I said the first time this came
         | up: all languages with GC have graphs builtin.
         | 
         | (Though C has graphs too. If every node shares the same
         | lifetime, then it's pretty easy to manage. Otherwise it can be
         | pretty painful)
         | 
         | And the good news is that you simply use the TYPE SYSTEM to
         | categorize your nodes and edges.
         | 
         | Your edges are references to other objects, which are typed.
         | Node can be typed as well.
         | 
         | ---
         | 
         | Although the original article does get at this -- there are
         | many types of graphs, and some of them can be encoded in a
         | typed object graph.
         | 
         | Some of them can't -- you need the equivalent of void* for the
         | edges.
         | 
         | Others would need a List[T] for the edges, if the out degree is
         | not fixed.
         | 
         | And that only covers directed graphs, etc.
         | 
         | Also, it's true that allocating all these tiny objects as GC
         | objects can be very slow, so then you use other representations
         | of graphs, like a list of pairs of node IDs.
         | 
         | I don't really think of it as a "missing" data structure, but
         | yeah now I do see how that framing can be useful.
        
         | tlb wrote:
         | A limitation is that you can only have one such graph in the
         | program. So any graph algorithm that returns a graph, or uses
         | another graph during the computation, doesn't fit.
        
           | boothby wrote:
           | Don't fall for the abstraction! Mathematically speaking, if
           | you have graphs G1 and G2; you can make another graph H with
           | nodes {G1, G2} and edges {(G1, G2)} and nothing goes wrong.
           | You can definitely view your whole operating system as a
           | graph; it doesn't invalidate having a program running which
           | processes graphs.
        
         | PaulHoule wrote:
         | For that matter it's true about a certain kind of C program. In
         | a civilized language you have run-time typing and introspection
         | and have all the type information so that it is straightforward
         | to look at the program's data structures as a graph.
         | 
         | If you are looking at the debug symbols and a copy of the
         | program you can usually figure this graph out but you might
         | need to think about it sometimes.
        
         | dustingetz wrote:
         | let x = 1; // edge named x         let y = f(x); // node named
         | f with input edge x and output edge y
        
       | obi1kenobi wrote:
       | As someone who did a lot of work with graphs, "why don't
       | programming languages have a built-in graph data type?" is a
       | question I've been asked a million times.
       | 
       | I'm thrilled I'll be able to point folks to a much more in-depth
       | analysis like the one here, instead of saying some variation of
       | "it's really hard to do it well" and having them just take my
       | word for it.
        
         | jjice wrote:
         | I used to think that since graphs are such a broad data
         | structure that can be represented in different ways depending
         | on requirements that it just made more sense to implement them
         | at a domain-ish level (the article mentions this in the "There
         | are too many implementation choices" section).
         | 
         | Then I saw Petgraph [0] which is the first time I had really
         | looked at a generic graph library. It's very interesting, but I
         | still have implemented graphs at a domain level.
         | 
         | [0] https://github.com/petgraph/petgraph
        
           | nickpsecurity wrote:
           | You might say graphs are an abstract concept, solving our
           | problems requires specialized graphs, and so we just use or
           | build the kind we need.
        
             | sanderjd wrote:
             | Yeah this is exactly how I think of it. I think "graphs"
             | are just at a different level of the abstraction hierarchy
             | than the data structures they're often grouped together
             | with.
             | 
             | This is even further up the abstraction hierarchy, but to
             | illustrate the point, nobody really wonders why languages
             | don't ship with a built-in database implementation. And
             | it's the same basic reason as with graphs; one size doesn't
             | fit most.
        
           | TachyonicBytes wrote:
           | Why wouldn't an abstract `Graph` type and specific
           | implementations of that work? Like Java has for `Set` and
           | various implementations?
        
             | gryn wrote:
             | even at that level of abstraction there are many flavors of
             | "graphs". what some people call graph are actually
             | hypergraphs, or attributed graphs.
        
             | skybrian wrote:
             | For importing and exporting data, it often makes more sense
             | to use something like a table or a tree rather than a
             | graph. (Like a CSV file or a JSON file.)
             | 
             | So it's not clear what the interface would do. What methods
             | should there be? Again, there are too many choices, and a
             | Graph interface often isn't the best way to represent a
             | view of some subset of a graph.
        
         | somat wrote:
         | This is a super naive take. but I would consider the pointer
         | the be the native graph type. What is wanted is not a graph
         | type but the tooling to traverse graphs.
        
           | alanbernstein wrote:
           | Maybe a pointer is better described as equivalent to a half-
           | edge?
        
             | andyferris wrote:
             | I think this might be kinda right.
             | 
             | Take rust. Tree edges use Box, and DAG edges use Arc or Rc.
             | Rust doesn't really help with building non-DAG graphs as
             | native data types - you can drop back to using "handles"
             | for more generic graphs.
             | 
             | And I actually think that is kinda fair. Languages and
             | standard libraries should support trees and DAGs out of the
             | box. Perhaps Hillel is right that more complex graphs
             | should be in a user-space library or something.
        
           | nine_k wrote:
           | It's a bit like saying that a byte is a native number type,
           | providing comparison to zero and addition as the only
           | operations, and asking to figure the rest yourself.
           | 
           | I mean, it's _technically_ sufficient, but I 'd not call it
           | proper support on a language level.
        
           | cwillu wrote:
           | Period should probably be a comma; I read this comment a
           | couple times before it made sense, rather than calling the
           | grand parent super naive.
        
           | boothby wrote:
           | I actually brought this hot take up in my conversation with
           | Hillel -- something along the lines of "void * * is the
           | canonical native graph representation in C."
        
           | computerdork wrote:
           | Hmm, think you're taking the title of the article a little
           | too literally, and focusing on just the "data type" aspect.
           | Yeah, the article itself indirectly agrees with you that
           | pointers are often how graphs are implemented: "Graphs are a
           | generalization of linked lists, binary trees, and hash
           | tables," which are data-structures often implemented by
           | pointers.
           | 
           | Yeah, the article is really saying what you're saying, that
           | building a graph library ("tooling") is very complicated.
           | 
           | But, maybe I'm misunderstanding what you're saying??
        
         | paulddraper wrote:
         | > "it's really hard to do it well"
         | 
         | More importantly, there are a lot of tradeoffs.
         | 
         | Virtually every language offers a hash map. You can roll your
         | own to outperform in an individual circumstance, the default
         | works pretty well.
         | 
         | You can't really do that with a graph. Maybe if you offered a
         | bunch of graph types.
         | 
         | ---
         | 
         | PS. Bit of trivia: Java's HashMap is a bit different from
         | almost every other language in that it lets you tune the load
         | factor.
        
           | sltkr wrote:
           | > Java's HashMap is a bit different from almost every other
           | language in that it lets you tune the load factor.
           | 
           | I'm not sure that's particularly unusual. For example, C++
           | supports this too:
           | 
           | https://en.cppreference.com/w/cpp/container/unordered_map/ma.
           | ..
        
           | adastra22 wrote:
           | > You can't really do that with a graph. Maybe if you offered
           | a bunch of graph types.
           | 
           | And so why isn't this the solution?
           | 
           | Most languages support both hash map (fast lookup) and
           | balanced tree (ordered entries) primitives, even though they
           | both implement the "associative map" container type.
           | 
           | Can't we have 2, 3, 5, or even 8 different graph types?
        
             | masklinn wrote:
             | > Can't we have 2, 3, 5, or even 8 different graph types?
             | 
             | That's what graph libraries do, look at petgraph. You've
             | got a bunch of graph implementations, and a bunch of
             | algorithms over them.
        
           | karmakaze wrote:
           | It's also different in that each hash bucket can use a Red-
           | Black tree when there's many entries.
        
         | sayrer wrote:
         | I think it's because you have to think it through when they get
         | too big for one machine. A lot of those so-called "NoSQL"
         | databases are really meant to represent graph databases (think
         | DynamoDB) in an adjacency list. I've had success with Gremlin
         | and JanusGraph, but it's a really messy space. It's not a
         | programming language problem in my opinion, but a distributed
         | systems one.
        
       | gilleain wrote:
       | > And then for each of those types we have hypergraphs, where an
       | edge can connect three or more nodes, and ubergraphs, where edges
       | can point to other edges.
       | 
       | Huh, I've heard of hypergraphs (although never actually really
       | used them) but never an 'ubergraph'. Sounds tricky!
       | 
       | In practice, how often are there situations you definitely need
       | hypergraphs? I had a particular situation where I needed graphs
       | that were both vertex coloured (labelled) _and_ edge coloured
       | (labelled) - even then it was outside the normal situation for
       | what I was doing (graph canonicalization).
        
       | rb-2 wrote:
       | I wonder if it would be possible to mathematically define (in a
       | theorem proving language like Coq) a bunch of accessor methods as
       | well as a bunch of implementation primitives and then "compile" a
       | custom graph implementation with whatever properties you need for
       | your application. Some accessor methods will be very efficient
       | for some implementations and very inefficient for others, but
       | every method will still be available for every implementation.
       | Profiling your application performance can help adjust the
       | implementation "compiler" settings.
       | 
       | Ironically, this is a graph problem.
        
         | kevindamm wrote:
         | This sounds like Partial Evaluation and the Futamura
         | Projection. The research around that shows that your
         | interpreter determines the shape of the compiled output, so a
         | formal proof of its application isn't necessary, if the
         | $mix$-equivalent has the appropriate syntax and semantics for
         | graph processes in its design.
         | 
         | I know this has been done for procedural languages and for
         | declarative logical languages but I'm not aware of something
         | like this specifically for graph processing and highly
         | specialized code generation of graph processing. I wouldn't be
         | surprised if Mix has been extended for this already, even if it
         | has I'm sure there is still value in it.
        
         | JonChesterfield wrote:
         | I think this is a worthwhile direction.
         | 
         | For example, I'd like to program against a sequence
         | abstraction. When sort is applied to it, I hope it's a vector.
         | When slice or splice, I hope it's some sort of linked
         | structure. Size is as cheap as empty for the vector but much
         | more expensive for a linked list.
         | 
         | It should be possible to determine a reasonable data
         | representation statically based on the operations and control
         | flow graph, inserting conversions where the optimal choice is
         | different.
         | 
         | The drawback of course is that people write different programs
         | for different data structures. Knowing what things are cheap
         | and what aren't guides the design. There's also a relinquishing
         | of control implied by letting the compiler choose for you that
         | people may dislike.
         | 
         | As an anecdote for the latter, clojure uses vectors for lambda
         | arguments. I thought that was silly since it's a lisp that
         | mostly works in terms of seq abstractions, why not have the
         | compiler choose based on what you do with the sequence? The
         | professional clojure devs I was talking to really didn't like
         | that idea.
        
         | andoando wrote:
         | Ive been thinking about something like this. A mathematical
         | definition of a function such that we can search it. Imagine we
         | had something like "Find a function that fits this signature ->
         | Input arr[numbers] out-> for every x in arr, x2>x1.
        
           | lupire wrote:
           | That's https://hoogle.haskell.org/ plus dependent types (data
           | constraints).
           | 
           | Without human provided dependent typing, the search engine
           | would be almost as hard to write as a system to directly
           | generate the code you need.
        
       | jes5199 wrote:
       | I think this is a cop-out. Someone in the 1990s could have
       | written the same thing about collections, or dictionaries, but we
       | eventually came up with good-enough compromises that Python,
       | Ruby, and Javascript all basically do the same thing. They don't
       | implement literally every case, but they are good enough for
       | "small" data, where the definition of "small" has grown to be
       | quite large by human standards.
       | 
       | I think the real problem is syntax. Someone needs to come up with
       | a textual graph literal that fits in source code, and it'll be
       | good enough for the small type. Switch to a custom ubergraph when
       | you exceed, idk, ten million entries.
        
         | gilleain wrote:
         | What do you mean by "textual graph literal"?
        
           | jes5199 wrote:
           | Textual array literal: [1,2,3]
           | 
           | Textual dict literal: {"a": 1}
           | 
           | Textual graph literal: ???
        
             | gilleain wrote:
             | Heh, one possibility is:
             | 
             | AB 0:1(C),0:2(D)
             | 
             | For a three node graph with edges between vertex 0 and 1,
             | and vertex 0 and 2, vertex labels 'A' and 'B' and edge
             | labels 'C', and 'D'. Not great to parse (as I sadly found),
             | but possible to read.
        
             | vonwoodson wrote:
             | Textual graph literal: {"a->b", "b->c", "c->a"}
             | 
             | Thinking about the programming language DOT https://en.wiki
             | pedia.org/wiki/DOT_(graph_description_languag...
             | 
             | This may be an effective way to express these graphs.
        
               | jes5199 wrote:
               | I like DOT a lot. Way back in the day we had a semi-
               | serious proposal to use it in the Puppet configuration
               | language to describe dependencies
        
               | Groxx wrote:
               | Same - DOT is a great way to go from zero to functional
               | in near minimal time. It's also pretty trivial to
               | generate since it's not order dependent, which is great
               | for lightweight automation.
               | 
               | Fine-tuning layout can be a real hassle though, sadly. I
               | haven't found any quick tools for that yet.
        
               | kevindamm wrote:
               | The way I approach fine-tuning DOT layout is to add
               | subgroups where it seems appropriate, add a 1px border
               | for the subgroup and see where the layout engine is
               | situating it next to other nearby vertices/edges.
               | Sometimes I may have to put a border around a few
               | subgroups, then attempt to adjust size of vertices and
               | entire area to nudge it to a local minima. Note: I don't
               | attempt to adjust the size of the subgroups, I'm not sure
               | that even works anyway, but maybe it depends on the
               | choice of layout algorithm, too. Padding and other style
               | on the vertex-box may help, too. It's been a few years
               | for me, tbh.
               | 
               | Deciding where the appropriate subgroups are is a bit of
               | an art. Sometimes it's obvious, as in bipartite graphs
               | that are intentionally bipartite. Or, if there is a
               | staged layout like for pipeline architectures. Sometimes
               | it's not obvious even when it seems it should be, like
               | when graphviz really wants to make a certain edge really
               | short. Be ready to backtrack sometimes. Then I usually
               | remove the subgroup border after I'm done, but a few
               | times they have been useful to leave there.
               | 
               | One thing I really like about DOT is that adding
               | hyperlinks to the vertices and edges that translate
               | decently into the compiled output is really nice. I had
               | an oncall dashboard that made liberal use of this feature
               | that I still think back on fondly sometimes.
        
               | whitten wrote:
               | Can you point to where I can learn more about this ? E.G.
               | an example and explanation exists for hyper link
               | embedding ?
               | 
               | I am especially interested in syntax suitable to be used
               | in creating something to input into https://www.viz-
               | js.com and creation of SVGs with embedded hyperlinks.
        
               | Izkata wrote:
               | If you're using subgraphs, with an edge across subgraph
               | boundaries, it is slightly order dependent - a node will
               | appear in the area it was first mentioned in. If you
               | define all the nodes/subgraphs first and all the edges at
               | the bottom you'd never notice this though.
        
             | WorldMaker wrote:
             | We've got a couple common graph literals today: Graphviz's
             | "dot" language [0] and the Mermaid language (in part
             | inspired from "dot") [1]. Mermaid is increasingly common in
             | documentation today, given Github supports it almost
             | everywhere Markdown is supported. Parsing dot or Mermaid as
             | embedded literals in another language doesn't seem that
             | complicated (Mermaid's parser is written in JS and already
             | runs anywhere you can run a JS parser).
             | 
             | Though one interesting thing to note here is that both
             | languages are edge list languages and are optimized for
             | algorithms that are useful for edge lists, particularly
             | display/diagramming. That gets back to the article's point
             | that there are other useful representations and those can
             | matter for efficiency of algorithms.
             | 
             | They also can matter for efficiency of a graph literal in a
             | source document. Edge lists are great for sparse graphs,
             | but if you have a dense graph it might be easier to write
             | as a literal with a massive spreadsheet of an adjacency
             | graph. Especially if it can just be a spreadsheet with
             | modern affordances like scrolling and sticky rows/columns
             | and such. There's no perfect answer for "every" graph.
             | 
             | [0] https://graphviz.org/doc/info/lang.html
             | 
             | [1] https://mermaid.js.org/intro/#flowchart
        
             | gre wrote:
             | array{ 1 2 3 }       dict{ { "a" 1 } }       graph{ { "a"
             | "b" } { "b" "c" } { "c" "a" } }       AVL{ { "a" 1 } { "b"
             | 2 } }
             | 
             | Factor has some syntax like this. All you have to do is
             | pass the `{...}` to the prefix (array/dict/graph/AVL) and
             | it knows how to construct one.
             | 
             | https://github.com/factor/factor/blob/master/extra/trees/av
             | l...
        
         | abeppu wrote:
         | And some languages (e.g. java, scala) have standard libraries
         | with interfaces that describe ordered collections,
         | dictionaries, etc, but offer multiple implementations so the
         | programmer can pick based on their specific considerations, but
         | still benefit from library code written around the interface.
        
           | Hackbraten wrote:
           | It also uses hint/marker interfaces (eg. `RandomAccess`) so
           | the library code can choose among algorithms internally while
           | still presenting an uniform API to the client.
        
         | PaulHoule wrote:
         | I've been involved with RDF and sometimes it seems the gap
         | between "too simple to use RDF" and "too hard to use RDF" is
         | tiny.
         | 
         | For instance I tried to pitch a data processing library a bit
         | like
         | 
         | https://www.knime.com/
         | 
         | but where RDF graphs (roughly like a JSON document) get passed
         | over the "lines" but found that the heavy hitters in this space
         | believed this sort of product has to use columnar execution to
         | be "fast enough". You can certainly build something that can do
         | operations on a dynamic RDF graph (really a set of triple) but
         | in principle you could compile code that treats native data
         | structures as if they were in RDF... You might get pretty good
         | in speed but it won't be as fast as native and hard to make it
         | easier to code for than native.
        
           | jes5199 wrote:
           | RDF is an XML standard, isn't it? there was a XML era for
           | collections and dicts, too. That is now largely considered to
           | have been a mistake
        
             | zozbot234 wrote:
             | RDF is not dependent on XML. There is a XML representation
             | of RDF, but alternatives include JSON (using JSON-LD) and
             | simple textual formats such as N3 and Turtle.
        
               | PaulHoule wrote:
               | One trouble w/ RDF is that there are two official ways to
               | do ordered collections, the RDF list which is basically a
               | LISP list, and then RDF collections where the order is
               | encoded in predicates. Neither is well-supported in most
               | tools, for instance SPARQL lacks the list handling
               | capabilities that you'd see in
               | 
               | https://docs.arangodb.com/3.12/aql/
               | 
               | or
               | 
               | https://www.couchbase.com/products/n1ql/
               | 
               | The XMP spec, for instance, hacks Dublin Core by adding
               | ordering information because... It matters what order the
               | authors are in. Dublin Core on the other hand seems to be
               | developed for cataloging elementary school libraries and
               | they were huge fans of doing the easy stuff and leaving
               | out anything moderately hard, so Dublin Core looks like
               | it has a 1968 level of sophistication and MARC is so much
               | more 2000s. People come to RDF, see all these problems
               | that are ignored, and come to the conclusion RDF is not
               | for them.
        
               | hobofan wrote:
               | There is a good repo that collects many of the issues
               | with RDF[0], which I would have loved to have known about
               | earlier in my journey with RDF.
               | 
               | [0]: https://github.com/w3c/EasierRDF
        
               | hobofan wrote:
               | While there are (commonly used) alternative serialization
               | formats, RDF is married to XML data model, as all core
               | datatypes of RDF are defined in terms of XSD (XML Schema
               | Definition) datatypes and very closely mimics it's data
               | model.
               | 
               | If you want to have an RDF that is independent of that
               | (e.g. based on Apache Arrow so that it's compatible with
               | modern big data tooling) you might as well start from
               | scratch.
        
         | obi1kenobi wrote:
         | Two problems I see here, based on the research I've done in
         | high-performance graph algorithms:
         | 
         | - It's hard to find a "good-enough" graph implementation. The
         | best hashtable is only a handful of percent better than the
         | built-in ones. The best graph impl is 1000x or more better than
         | any generic built-in one could be, so there's much more
         | incentive to specialize (and people already specialize
         | hashtables for just a handful of percent speedup!)
         | 
         | - The baseline complexity level of implementing a reasonable
         | hashtable is fairly high, even if for a small dataset. The
         | baseline complexity of implementing a graph algorithm for a
         | small dataset is pretty low, and the real problems come in
         | later / at larger scale. So in graphs there's less incentive to
         | learn a complex library's API when "I could just hack it
         | myself," unlike for hashtables where the API is simple and
         | doing it myself is much harder.
        
           | bruce343434 wrote:
           | More than just a handful of percent[1], but ok
           | 
           | [1] https://probablydance.com/2017/02/26/i-wrote-the-fastest-
           | has...
        
             | obi1kenobi wrote:
             | The work you cited is very impressive and very welcome.
             | 
             | But you seem to be implying that `std::unordered_map` is
             | the default choice one would use, which in my experience is
             | not accurate -- it is well-known to have serious perf
             | shortcomings, and everyone I know uses some other
             | implementation by default. Even so, the delta from
             | `std::unordered_map` to the improved hashtable in the blog
             | post is impressive, and just shy of 10x.
             | 
             | Graph algorithms frequently have 10x improvements from one
             | state-of-the-art approach to the next -- for example,
             | here's one from my own research[1]. The delta between
             | state-of-the-art and "good default" in graph algorithms
             | would often be around 100-1000x. And comparing state-of-
             | the-art to the equivalent of an `std::unordered_map` would
             | be another 10-100x on top of that, so 1000-100000x total.
             | 
             | [1]: https://dl.acm.org/doi/10.1145/3210377.3210395
        
           | brazzy wrote:
           | > The baseline complexity level of implementing a reasonable
           | hashtable is fairly high, even if for a small dataset.
           | 
           | Have you tried doing it? My experience was that it was
           | surprisingly simple. We may have different expectations for
           | what is "reasonable", of course.
        
           | DSMan195276 wrote:
           | > The baseline complexity level of implementing a reasonable
           | hashtable is fairly high, even if for a small dataset.
           | 
           | I would disagree with this, it's actually really easy to make
           | one if you're willing to do away with many features (which
           | aren't essential, but provide performance benefits).
           | Implementing one is just something you never have to do in
           | most modern languages.
        
         | 082349872349872 wrote:
         | https://lobste.rs/s/uhmhum/hunt_for_missing_data_type#c_1na1...
         | 
         | > _matrix multiplication and permanents are known to be non-
         | cheap to compute, requiring worse-than-quadratic time! So any
         | sort of good graph algorithm must dig deeper and be more
         | specialized for the task at hand_
        
         | boothby wrote:
         | As somebody who's actually done the work, I don't think you've
         | really spent time with the question. The problem is that
         | graphs, in general, aren't special. They're a structural
         | dumping ground.
         | 
         | Data structures are specialized graph representations. The
         | source code you write is in a specialized graph representation.
         | For the overwhelming majority of programmers, the fact that
         | something can be viewed as a graph is much like saying "oh well
         | that file is just a really long sequence of bits, so it's
         | effectively an integer." It's not wrong, but is it useful?
        
       | pfdietz wrote:
       | This is looking like a programming language problem. There is no
       | way to make graph algorithms available in a way where the details
       | are abstracted away.
       | 
       | What would a programming language look like that could address
       | all those issues?
        
         | tlb wrote:
         | Some C++ libraries do pretty well. In the Eigen math library
         | you can call most functions on dense or sparse matrices, and
         | some template magic makes it work efficiently for all
         | combinations.
         | 
         | It's a shame it's so hard to write that kind of generic
         | template code.
        
       | andsens wrote:
       | _devils advocate_ : Is this maybe a case of discarding an 80%
       | solution because you can't do the last 20%?
       | 
       | I understand the constraints, but imagine how legible you could
       | make code by replacing some key parts with a graph type that
       | everybody knows. I honestly think that having a type that
       | supports a small subset of possibilities and only has the
       | simplest algorithms implemented would go a long way.
        
         | boothby wrote:
         | It isn't just an 80/20 problem. Imagine that you replace every
         | linked list, binary tree, trie, etc., with a generic directed
         | graph datatype. The resulting performance would be _abysmal_.
         | The resulting notation would be horrid, too.
         | 
         | Here's our nice linked list:                 def
         | last_element(ll):           last = ll           while ll is not
         | None:               last = ll               ll = ll.next
         | return last
         | 
         | And here's an implementation with generic graph notation:
         | def last_element(g):           for v, deg in g.out_degree:
         | if deg == 0:                   return v           return None
         | 
         | There are several problems with this; most importantly, there
         | can be silent failures when g is not a linked list. But it also
         | throws out a useful abstraction where a list is equivalent to a
         | node, so I wrote a horrid implementation that takes O(n)
         | regardless of the position in the list. And then comes all the
         | baggage of representation, because you can't just represent a
         | node with a pointer anymore.
         | 
         | When your data structure better reflects the, well, structure
         | of your data, you can go faster and safer. There's a reason we
         | teach undergrads about these specific datatypes and don't just
         | sweep it all under a rug with "it's a graph!"
        
         | bigbillheck wrote:
         | I think it's more like discarding a 20% solution that can't do
         | the last 80%.
        
       | 4ad wrote:
       | Ok, trees are not graphs but they are _very_ related, and
       | algebraic data types are trees, so they are ubiquitous in
       | functional programming.
       | 
       | Why don't we have graphs in FP (or in Rust)? Because graphs
       | require mutation (respectively break linearity).
       | 
       | Why don't we have graphs in imperative languages? Perhaps because
       | very few imperative languages have ADTs? Just a thought.
        
         | boothby wrote:
         | I disagree: trees are connected graphs with n nodes and n-1
         | edges. They're graphs with a special structure that is highly
         | exploitable for performance gains, but they're graphs.
        
         | zokier wrote:
         | In what sense graphs require mutation?
        
           | joshlemer wrote:
           | I think maybe they're talking about how, in order to create a
           | data structure where two nodes contain a reference (edge) to
           | each other, you have to resort to mutation. i.e. you have to
           | create A, then B with reference to A, then mutate A to refer
           | to B, or you have to create B, then A with reference to B,
           | then mutate B to refer to A.
           | 
           | Though this ignores that there are other ways to represent
           | graphs, such as adjacency matrices, etc.
        
         | harporoeder wrote:
         | Even with direct mutual references between some node type this
         | can be represented in a lazy functional language such as
         | Haskell pretty easily without mutation.
        
         | bigbillheck wrote:
         | > Why don't we have graphs ... in Rust?
         | 
         | You might have missed this from the article but:
         | https://docs.rs/petgraph/latest/petgraph/index.html
         | 
         | > Because graphs require mutation (respectively break
         | linearity).
         | 
         | I don't think this is actually the case. Graph nodes go in one
         | container (`Vec` or `HashMap` or `BTreeMap`), and the edges go
         | in another container (`HashMap` or `BTreeMap`). The object in
         | which you store the node only needs to know what its name is,
         | you can let something else know what its neighbors are.
        
       | jerf wrote:
       | This is a good article and I endorse it.
       | 
       | I would supplement it with the observation that when I was a
       | younger programmer, like many people, I considered "generic" or
       | "flexible" a positive when describing a library or framework. I
       | have come to see it as generally negative, especially when the
       | developer's summary puts these adjectives or something similar
       | front and center.
       | 
       | Let me show you the most flexible possible Javascript framework.
       | This will look like a joke, but it's not. It fits perfectly into
       | an HN post. The most flexible possible JS framework is simply:
       | eval
       | 
       | Similarly flexible frameworks exist for dynamic scripting
       | languages. For static languages one must invoke the entire
       | compiler as the framework. Of course, if you think about it hard
       | enough, I'm doing that for dynamic languages here too, it just
       | has a snappier representation for dynamic languages.
       | 
       | Frameworks and libraries provide their value precisely through
       | limiting things, and then building on those limitations. Of
       | course, the limitations must be well-chosen, to make what can be
       | built on them interesting enough to pay for the limitations the
       | framework chooses. But the essence of them are in their
       | limitations. I start out from the get-go with the maximally
       | flexible framework my language allows, which is itself the
       | language, and the additional framework needs to make limitations
       | on my code in order to do anything useful.
       | 
       | (A problem when framework designers don't understand this is that
       | they make a series of little incorrect design decisions that can
       | often add up to a real pain. For instance, if I were to design a
       | web framework that took over some amount of routing from the
       | user, I would still leave you the ability to claim some bit of
       | the URL space and route it entirely out of my framework, because
       | I understand that my framework is based around limitations and
       | you may need to expose a URL to something that can't work under
       | those limitations. But someone who doesn't realize that
       | frameworks intrinsically involve limitations might fail to give
       | that callout because they can't imagine that someone might have a
       | problem that their framework is not "flexible" and "generic"
       | enough to handle. Imagine a CRUD framework, even a very good one,
       | but I need to offer an endpoint based on streaming server events
       | in the same URL space, which is intrinsically foreign to the CRUD
       | framework's concept of page loads. This is just one example; real
       | frameworks designed without this understanding will make dozens
       | or hundreds of such little mistakes.)
       | 
       | Graphs have the same problem. It seems like they're so flexible
       | and generic that they ought to be more used and more useful. But
       | that's precisely what kills them. Even if you nail down the
       | problem to exactly one representation, they still don't fit. For
       | instance I have a great need for data structures that don't admit
       | cycles, but if all I have is a graph, imposing that limitation
       | from a coding perspective is a real challenge. Mathematically
       | it's trivial, I just say "and this graph has no cycles" _et
       | voila_ [1], there are no cycles, but in code I need to enforce
       | that somehow and there 's no trivial solution to that.
       | 
       | Another way of viewing graphs is that we _do_ work in graphs all
       | the time, precisely because everything in RAM can be seen as a
       | graph. GC algorithms even generally work by viewing everything in
       | very raw graphy terms. It just turns out the API you 'd expect to
       | work over a graph just isn't useful in the general sense when
       | applied to everything in a programming language's memory space.
       | It seems like it ought to be, but it just isn't. It may seem like
       | it would be great to have a set of employees and extract their
       | names and then look that up into another database etc. etc. with
       | a general graph query language or something, but it turns out the
       | special considerations at each layer make it so that what the
       | general purpose programming language is already doing is actually
       | generally better. The details at each layer matter.
       | 
       | I like the metaphor of architecture astronautics and have often
       | discussed "the 30,000 foot view" versus the view on the ground
       | here on HN, and to my mind the key to the metaphor isn't the
       | nerdery of being an astronaut or the difficulty. The key is that
       | when you get high up, everything looks the same. It feels like
       | graphs ought to be awesome when you're looking down at the entire
       | computing landscape from metaphorical low Earth orbit. But down
       | in the trenches, the local concerns overwhelm that viewpoint...
       | and this is real. This is not just because we all suck or we
       | don't try hard enough or we just Don't Get It. It's real. The
       | architecture astronaut is just wrong in this case. It's not even
       | a beautiful vision this world isn't good enough to manifest or
       | any such conciliatory thing... it's just wrong. It is good to
       | write good code and reduce the amount of bespoke details to be
       | considered. The programming community has made great progress
       | there and there is still great opportunity to do more. But there
       | are an awful, awful lot of details in the world, and the world
       | being detailed is fundamental.
       | 
       | [1]: Or if you are, like me, kinda a fan of surprise stringed
       | instrument attacks, _et viola_.
        
         | Terr_ wrote:
         | > when I was a younger programmer, like many people, I
         | considered "generic" or "flexible" a positive when describing a
         | library or framework [...]
         | 
         | I've come to prefer what I call "design for deletion": Most of
         | those long-term "flexibility someday" needs are best-met by
         | making sure the inflexible modules or flows can be clearly
         | identified and ripped out for replacement. This leads to a
         | certain kind of decoupling, although with a higher tolerance
         | for coupling that can kept in check by static analysis.
         | 
         | This is a contrast to my days of youthful exuberance where I
         | thought I could solve the problem by making my work
         | _extensible_ or _customizable_. No, I cannot make the immortal
         | program, so I should focus on making a mortal one which can
         | pass gracefully.
        
       | montmorency88 wrote:
       | I'd highly recommend Erwigs FGL library in Haskell as a really
       | nice example of a generally performant graph data structure that
       | is easy to work with. The API feels like working with lists
       | because you are essentially consing contexts(node, neighbours)
       | into a list of contexts that form your graph. Many standard graph
       | algorithms are then built up from depth or breadth first search
       | and you can compose really succinct programs to manipulate the
       | graph. Graphs are labelled with any Haskell data structure and
       | there is a graphviz library complementary to FGL to generate dot
       | files which can be rendered according to the data carried in the
       | node labels. Often in an application you want to both perform
       | computations on your graph and render a visualization
       | simultaneously to the end user or for debugging and in FGL you
       | minimise duplication of application and rendering logic.
        
         | tikhonj wrote:
         | FGL is a great example of how to make a "nice" high-level graph
         | interface suited for functional programming. I'm a big fan. But
         | it's orders of magnitude too slow and memory-inefficient for
         | performance-sensitive graph computations--if you have even
         | moderately sized graphs and graph algorithms are a bottleneck,
         | you'll need to use something else, and probably something
         | domain-specific. Given the way the interface works, I don't
         | think you _could_ have a high-performance version that would
         | scale to large graphs.
         | 
         | In my experience this leaves FGL in an awkward spot: on the one
         | hand, it isn't sufficient for heavy-duty graph processing; on
         | the other, if you _don 't_ need fancy high-performance graph
         | algorithms, chance are that encoding your problem as a graph is
         | going to be more awkward than defining some domain-specific
         | types for what you're doing. Graphs are such a general
         | structure that they're usually the wrong level of abstraction
         | for higher-level domain-specific logic.
         | 
         | Of course, sometimes you're writing graph code specifically and
         | you need a nice way to express your graph algorithms without
         | worrying about performance. In that case, FGL is great. I wrote
         | a tutorial about using it to [generate mazes][1] and it helped
         | me express the algorithms better than I would have been able to
         | do without it. But that still leaves it as too narrow for
         | something to be "the" graph representation in a language's
         | standard library.
         | 
         | [1]: https://jelv.is/blog/Generating-Mazes-with-Inductive-
         | Graphs/
        
           | montmorency88 wrote:
           | Interesting. Under the hood FGL is mapping the graph to
           | relatively efficient data structures like Patricia Trees as
           | implemented in Data.IntMap so I would expect it to scale
           | reasonably for inserting edges and mapping over nodes. I
           | agree the memory inefficiency is definitely a limiting factor
           | of the library. As you say I think it is best suited for
           | expressing graph algorithms and if those calculations become
           | the bottle neck a custom solution can be developed with the
           | proof of concept already in place. Out of curiosity what
           | number of nodes/edges would you consider a "moderately sized
           | graph"? My user cases are typically on the order of 500-1000
           | nodes with a similar number of edges that require bfs and
           | topological sorting.
        
             | tikhonj wrote:
             | I don't have an exact number in mind--I imagine it's pretty
             | context-specific--but 500-1000 nodes seems like it would
             | qualify.
             | 
             | I've played around with IntMap before and it's not a
             | _great_ data structure. It 's a _binary_ Patricia trie,
             | which means that you quickly get a relatively deep tree
             | with lots of pointer traversals. Unless I 've managed to
             | confuse myself on how it works, you'd end up with, what, at
             | least 10 traversals to look up a value from 1000 keys?
        
           | Chris_Newton wrote:
           | _But it 's orders of magnitude too slow and memory-
           | inefficient for performance-sensitive graph computations--if
           | you have even moderately sized graphs and graph algorithms
           | are a bottleneck, you'll need to use something else, and
           | probably something domain-specific._
           | 
           | This seems a little pessimistic to me. There are plenty of
           | application domains that can be conveniently represented
           | using graphs where you might have thousands of nodes and
           | edges -- which is what I'd characterise as "moderately sized"
           | -- and your needs might only extend to relatively simple and
           | efficient graph algorithms. FGL is excellent in this kind of
           | scenario.
           | 
           | If you do need the kind of algorithms that explode in
           | complexity then even a representation a couple of orders of
           | magnitude more efficient won't help you much either. Big-O is
           | the thing that is going to spoil your day in this story, not
           | the constant factor. Some problems simply don't have
           | convenient fast solutions and ideally with those you find a
           | way to change the representation so the original problem
           | doesn't arise in the first place.
           | 
           | It's true that there's also a zone where you have
           | significantly larger graphs but still only need
           | computationally tractable algorithms, and in that case the
           | overheads of a library like FGL become a factor in what is
           | viable. I also don't disagree with you (and Hillel in the
           | original piece) that it would be difficult to define
           | comprehensive graph functionality to include in a standard
           | library when there are so many different trade-offs involved.
           | 
           | A good -- and not entirely unconnected -- analogy might be
           | calculating with matrices. It's convenient to have support
           | for simple but widely useful cases like 3x3 and 4x4 built
           | into your language or standard library. However, once you're
           | solving systems with hundreds or thousands of rows, you
           | probably want more specialised tools like BLAS/LAPACK, and
           | the structure of your matrix and how you can decompose it
           | start to matter a lot more.
        
             | michelpp wrote:
             | > you probably want more specialised tools like BLAS/LAPACK
             | 
             | The GraphBLAS and LAGraph are sparse matrix optimized
             | libraries for this exact purpose:
             | 
             | https://github.com/DrTimothyAldenDavis/GraphBLAS
             | 
             | https://github.com/GraphBLAS/LAGraph/
        
       | mizzlr_ wrote:
       | graph is a data structure, not a data type. if you squint enough
       | pointers are the building blocks (the data type is you please)
       | for building graph data structure. a->b is pointer access, looks
       | like an edge in the graph.
       | 
       | graph data structure is parent of tree, code execution/ function
       | call stacks work like a tree, think flame graphs.
       | 
       | stacks and pointers are baked in assembly and cpu architecture.
       | your claims can't be farther from the truth.
        
         | renewiltord wrote:
         | Pointer-based graph structure will make matrix algos painful to
         | implement. A graph is a concept. Article is quite meaningful
         | about how it follows the various subtypes. Would recommend
         | reading.
        
         | Chris_Newton wrote:
         | _graph is a data structure, not a data type._
         | 
         | It can refer to either.
         | 
         | Any concrete data structure that uses indirection -- which
         | means pretty much anything more complicated than dense arrays
         | and records -- is indeed a form of graph.
         | 
         | But graphs, and more constrained forms like DAGs and trees, can
         | also be abstract data types, implemented by a variety of
         | concrete representations.
         | 
         | One of life's little ironies is that implementing a general
         | abstract graph using a general concrete graph whose records and
         | pointers correspond (roughly) 1:1 with the nodes and edges in
         | the abstract graph is often a poor choice.
         | 
         | Moreover, it's not unusual to have an abstract graph
         | implemented using a non-graph data structure (for example, a
         | dense adjacency matrix) or to use a graph-like data structure
         | to implement an abstract data type whose interface doesn't look
         | particularly graph-like (for example, a piece table).
        
       | gue5t wrote:
       | One interesting perspective is to view the sequence of lists ->
       | trees -> DAGs -> general graphs as a loosening of restrictions:
       | 
       | - list nodes may have one child
       | 
       | - tree nodes may have multiple
       | 
       | - DAG nodes may have multiple parents though restricted by
       | topological ordering
       | 
       | - graph nodes may have multiple parents from anywhere in the
       | collection
       | 
       | Lists and trees can be fully captured by sum and product types,
       | but extending this representation style to DAGs and graphs
       | doesn't work--you either get inefficiency (for DAGs) and then
       | infinite regress (for cyclic graphs) attempting to continue the
       | "syntactic" style of representation, or you need to adopt an
       | "indirect" representation based on identifiers or indices or hash
       | consing.
        
         | nickpsecurity wrote:
         | That's a neat idea. The other side of it might be
         | expressiveness.
         | 
         | The more expressive constructs are usually more productive and
         | concise. The less expressive constructs are usually easier to
         | optimize or analyze with a machine. The old rule, like in
         | LANGSEC, is to pick the least-expressive option that works.
         | Some people also develop transforming code (eg netaprogramming)
         | to let you write highly-expressive code that generates correct,
         | low-expressiveness code.
        
         | schindlabua wrote:
         | I like thinking of this concept via free constructions.
         | 
         | As is somewhat commonly known the free Monoid is the List type;
         | monoids are not commutative so we get a sense of "direction",
         | like a list has a start and an end.
         | 
         | If we add commutativity and look at free groups, we find they
         | are equivalent to multisets.
         | 
         | If we take associativity away from monoids and look at free
         | semigroups, we get binary finger trees, I think?
         | 
         | In some sense removing constraints from the binary operator
         | results in more general free types. Would be interesting to
         | find what free construction makes digraphs but I have to
         | bounce.
        
       | taeric wrote:
       | A graph, as presented in this article, is a model of something
       | else. That we have more than one way to implement this model is
       | rather natural, all told. The hope that we can have a singular
       | syntax and data structure that represents a graph in code is
       | almost exactly why the author of Java's Linked List posted this
       | gem: https://twitter.com/joshbloch/status/583813919019573248
       | 
       | My favorite on the idea of having a linked list where the node is
       | first class in your code, is almost precisely the problem. You
       | rarely want/need to work at that level. In a very real sense,
       | objects that have other objects are already trees of data. Many
       | can back reference, such that then you have a graph.
       | 
       | And then there is the joy of trying to use matrix operations to
       | work with graphs. You can do some powerful things, but at that
       | point, you almost certainly want the matrix to be the
       | abstraction.
       | 
       | Excited to see someone come up with good things in this idea. I
       | retain very serious doubts that I want a singular model for my
       | data.
        
       | dustingetz wrote:
       | Electric Clojure uses Clojure itself (s-expressions) as a graph
       | authoring syntax, using a macro to reify dataflow through a
       | reactive client/server system (here the use case is full stack
       | user interfaces but the idea generalizes)
       | https://github.com/hyperfiddle/electric (I'm the founder).
       | 
       | IMO, the answer to the question "Where are all the graph types?"
       | is: the graph authoring DSL needs to express scope, control flow
       | and abstraction, which essentially makes it isomorphic to a
       | programming language, freed of its evaluation model. In Python
       | and Typescript, embedding a complete programming language is
       | something that's rather hard to do!
       | 
       | Also see my blog post "Four problems preventing visual flowchart
       | programming from expressing web applications"
       | https://www.dustingetz.com/#/page/four%20problems%20preventi...
        
       | michelpp wrote:
       | I think one of the elements that author is missing here is that
       | graphs are sparse matrices, and thus can be expressed with Linear
       | Algebra. They mention adjacency matrices, but not sparse
       | adjacency matrices, or incidence matrices (which can express muti
       | and hypergraphs).
       | 
       | Linear Algebra is how almost all academic graph theory is
       | expressed, and large chunks of machine learning and AI research
       | are expressed in this language as well. There was recent thread
       | here about PageRank and how it's really an eigenvector problem
       | over a matrix, and the reality is, all graphs are matrices,
       | they're typically sparse ones.
       | 
       | One question you might ask is, why would I do this? Why not just
       | write my graph algorithms as a function that traverses nodes and
       | edges? And one of the big answers is, parallelism. How are you
       | going to do it? Fork a thread at each edge? Use a thread pool?
       | What if you want to do it on CUDA too? Now you have many
       | problems. How do you know how to efficiently schedule work? By
       | treating graph traversal as a matrix multiplication, you just say
       | Ax = b, and let the library figure it out on the specific
       | hardware you want to target.
       | 
       | Here for example is a recent question on the NetworkX repo for
       | how to find the boundary of a triangular mesh, it's one single
       | line of GraphBLAS if you consider the graph as a matrix:
       | 
       | https://github.com/networkx/networkx/discussions/7326
       | 
       | This brings a very powerful language to the table, Linear
       | Algebra. A language spoken by every scientist, engineer,
       | mathematician and researcher on the planet. By treating graphs
       | like matrices graph algorithms become expressible as mathematical
       | formulas. For example, neural networks are graphs of adjacent
       | layers, and the operation used to traverse from layer to layer is
       | matrix multiplication. This generalizes to all matrices.
       | 
       | There is a lot of very new and powerful research and development
       | going on around sparse graphs with linear algebra in the
       | GraphBLAS API standard, and it's best reference implementation,
       | SuiteSparse:GraphBLAS:
       | 
       | https://github.com/DrTimothyAldenDavis/GraphBLAS
       | 
       | SuiteSparse provides a highly optimized, parallel and CPU/GPU
       | supported sparse Matrix Multiplication. This is relevant because
       | traversing graph edges IS matrix multiplication when you realize
       | that graphs are matrices.
       | 
       | Recently NetworkX has grown the ability to have different "graph
       | engine" backends, and one of the first to be developed uses the
       | python-graphblas library that binds to SuiteSparse. I'm not a
       | directly contributor to that particular work but as I understand
       | it there has been great results.
        
         | jcgrillo wrote:
         | I was really excited about RedisGraph, and sad to see it was
         | cancelled. In my (limited) experience with graph databases they
         | proved very frustrating because it seemed like they tried to do
         | too much. Ultimately the way I thought of a graph was an
         | indexing strategy into some underlying data. So I needed the
         | graph to be very quick, but I didn't have any requirement to
         | store actual data in it--just references. This made triples
         | based graph storage seem very heavy-handed.
         | 
         | The idea of composing sparse linear transformations to optimize
         | queries is really cool. You can get a lot of work done in one
         | shot that way, in a manner that's just quite a lot easier on
         | the machine than chasing pointers around.
        
       | adamnemecek wrote:
       | > Mathematica, MATLAB, Maple, etc all have graph libraries of
       | some form or another. I am not paying the thousands of dollars in
       | licensing needed to learn more.
       | 
       | Wolfram provides a free Mathematica called Wolfram Engine
       | https://www.wolfram.com/engine/. It's Mathematica without the UI.
       | I hear you can combine it with Jupyter Notebook to get a similar
       | experience to Mathematica.
        
         | dekhn wrote:
         | Ha! I wrote the predecessor for the Python integration some ~25
         | years ago
         | https://library.wolfram.com/infocenter/MathSource/585/ and I
         | think it uses a similar approach. The mathematica engine had a
         | protocol for sending and receiving expressions and evaluating
         | them. I was pretty proud of my implementation as it it was my
         | first real attempt at doing anything remotely complicated with
         | recursion.
        
         | rhodin wrote:
         | You can also use wolframcloud.com for free. Which runs the
         | latest Mathematica online.
        
       | hobofan wrote:
       | I think the post mostly answers the questions "why are graph
       | _algorithms_ not better supported in programming languages", with
       | a focus that is much more on "big data" graph processing than
       | graph support in general.
       | 
       | I think if you look at graph support in general you are also
       | looking at wider questions, like "why are OGMs (Object Graph
       | Mappers) not as popular as ORMs" and "why is JSON so prevalent
       | while RDF (or another low-level graph serialization) isn't"?
       | 
       | And I think in the end it comes down to historic reasons (RDF
       | emerged a bit too early and never evolved and accrued an
       | ecosystem of horrible academic standards and implementations),
       | and a graphs having just a smidge more of inherent complexity in
       | implementation and learning curve that just doesn't scale well
       | across many developers.
       | 
       | ------
       | 
       | I also wouldn't put too much weight on the "Graph Querying
       | Language" part of the article. It sadly reads like exactly the
       | marketing copy you would read from Neo4J or SPARQL enthusiasts
       | that haven't tried building a product on top of it.
       | 
       | > The main difference between all GQLs and SQL is that the
       | "joins" (relationships) are first-class entities.
       | 
       | Joins are first-class entities in SQL. They even have their own
       | keyword (hint: it starts with J and ends with OIN) ;)
       | 
       | If you also go to the lower levels of any graph query language
       | and look at it's query plans you'll notice that there isn't any
       | meaningful difference to that of one you'll find in an SQL based
       | query. The standardization of GQL[0] as an SQL extension is
       | evidence for that.
       | 
       | > In SPARQL relationships are just edges, making the same query
       | easy.
       | 
       | SPARQL is easy if you need to do exact path traversals. If you
       | try to do anything sophisticated with it (like you would in the
       | backend of a webapp), you'll quickly run into footguns like joins
       | with unbound values and you accidently join your whole result set
       | away.
       | 
       | [0]: https://en.wikipedia.org/wiki/Graph_Query_Language
        
         | lmm wrote:
         | > Joins are first-class entities in SQL. They even have their
         | own keyword (hint: it starts with J and ends with OIN) ;)
         | 
         | Having its own keyword is pretty strong evidence that something
         | _isn 't_ first-class (e.g. typeclasses are not first-class in
         | Haskell; control flow is not first-class in most programming
         | languages).
        
       | mcphage wrote:
       | This is a good write up, but for me seems to miss the mark. I
       | agree with the author that different algorithms require
       | differently organized data structures. But what if your problem
       | requires 2 different algorithms, which each work best with a
       | different data structure? Either you pick one and run both (which
       | is not ideal) or you run the first algorithm with one structure,
       | convert it, and the run the second algorithm in the second
       | structure.
       | 
       | And once you can do that... why not have every algorithm ensure
       | it runs in its best structure, and convert where necessary (and
       | possible) on the way in? Yes, there's absolutely a performance or
       | storage cost... but if the algorithm is that much faster, it
       | should be worth it.
       | 
       | Basically a beefier version of sorting your data before searching
       | in it. If an algorithm works best with a specific model of a
       | graph, then let that be part of the algorithm.
        
         | riwsky wrote:
         | > why not have every algorithm ensure it runs in its best
         | structure, and convert where necessary (and possible) on the
         | way in? Yes, there's absolutely a performance or storage
         | cost... but if the algorithm is that much faster, it should be
         | worth it.
         | 
         | Because it's not that much faster, so it's not worth it. You're
         | severely underestimating the amount of thought that went into
         | the article, or the work of the experts interviewed.
        
       | bevekspldnw wrote:
       | Was seriously hoping at the start that article would reveal some
       | secret easy solution to all my problems...only to learn there are
       | none.
       | 
       | :'(
        
         | boothby wrote:
         | I take solace in such findings. I learned that my quest was
         | ill-conceived, and I abandoned it; I should be glad that I'm no
         | longer searching for what cannot be found.
        
           | bevekspldnw wrote:
           | Yes that's true, the "smarter people than I failed at this"
           | factor is definitely a form of solace.
        
       | at_a_remove wrote:
       | Hrm. This reminds me of a thought I had about, for the lack of a
       | better term, "groups" in Python. Each data type has a property
       | which is paramount and more or less defines the type.
       | 
       |  _Lists_ are ordered. The _tuple_ is immutable. The _dict_ is
       | keyed. The _set_ is unique. (But there are some overlaps: dicts
       | are both keyed and their keys are unique.)
       | 
       | And I thought, what if you had a group where you could make it
       | immutable or mutable, ordered or not-ordered, where the value was
       | a key to something else (or not), and so on? But then I saw the
       | weird edge cases and the explosions of complexity. Some
       | combinations of these attributes are less appealing than others.
        
       | JonChesterfield wrote:
       | I needed a directed graph yesterday. I gave the nodes integer ids
       | by appending them to an arena, then used a hashtable from integer
       | to vector of integer. Iterating over it involves a set of
       | integers to track which nodes have already been visited.
       | 
       | Deduplicating nodes on insert into the area was more hassle than
       | cobbling together the graph structure out of a hashtable and a
       | vector.
       | 
       | Maybe one reason against putting graphs in the standard library
       | is they're easily put together from more common structures for
       | whatever special case you have in mind.
        
         | jkaptur wrote:
         | I agree with this take - graphs are in an interesting
         | superposition where implementing a large set of algorithms
         | _correctly_ and in their full generality is hard (as the
         | article points out), but getting started with an implementation
         | that solves _a particular problem well enough for a particular
         | case_ is easy and fun.
        
         | PeeMcGee wrote:
         | > Maybe one reason against putting graphs in the standard
         | library is they're easily put together from more common
         | structures for whatever special case you have in mind.
         | 
         | This is a fair argument (how implementations tend to combine
         | existing structures in bespoke ways). But any time I've needed
         | to use a graph explicitly, it hasn't really mattered what
         | underlying structures were involved. What has mattered each
         | time is having to invent my own little API to expose well-
         | known/primitive graph operations, then go and implement them
         | which is unnecessarily error prone.
         | 
         | Your example of de-duplicating nodes on insert sounds like it
         | describes a property of your particular graph that may be
         | better expressed through a type, which would also afford the
         | necessary API. I'm approaching this from an OOP-ish perspective
         | so do with that what you will.
         | 
         | > I gave the nodes integer ids by appending them to an arena,
         | then used a hashtable from integer to vector of integer.
         | Iterating over it involves a set of integers to track which
         | nodes have already been visited.
         | 
         | This is what sucks about using graphs IMO. I don't want to
         | think about all that stuff, I just want think about graphs. In
         | practice I spend most of the time toiling around with noisy
         | boilerplate that dominates my mental model and allows graph
         | concerns to leak into business concerns.
        
       | qazxcvbnm wrote:
       | This reminds me of my quest to find the proper method to model
       | what I've termed 'large types' in an ideal language.
       | 
       | As is well known, algebraic data types as commonly found, consist
       | of sums of products, yet a great deal of useful types are larger
       | than that; some hopefully illustrative examples include:
       | 
       | 1) the type of subsets of another type would be 2^X (hopefully
       | demonstrating what I mean by 'large'ness);
       | 
       | 2) in practical languages like TypeScript, the 'Partial' of a
       | product type A x B x C would be (1 + A) x (1 + B) x (1 + C);
       | 
       | 3) data structures in general, as a term amenable to some certain
       | set of operations, when needing to be represented for performance
       | reasons e.g. a) union-find structures (quotients?); b) a list of
       | words and their inverted indexes for searching; c) a sorted list
       | 
       | Reading more about type modelling, and learning of the
       | disagreements in how even basic things like quotients ought to be
       | represented as types, I've since resigned to an understanding of
       | this as an unsolved problem, and relegated the modelling the
       | kitchen sinks of types with _the_ kitchen sink of types - i.e.
       | the function type (curbed with suitable type constraints upon the
       | signature - from an index type to a suitable base type) - after
       | all, its power and province being the irreducible kernel of type
       | polymorphism, shadow over Church 's types, original sin against
       | type decidability.
        
         | andoando wrote:
         | I've been thinking about something like this for the last 3
         | years. However, I can't find a practical reason for it, even
         | though I am sure there are.
        
         | gue5t wrote:
         | These types don't seem to escape the scope of what can be
         | described with algebraic types, but the relationships between
         | them seem like you're looking for a notion of type-level
         | functions: subset [?] X => 2^X, partial [?] AxB =>
         | (A+1)xpartial(B)
        
           | qazxcvbnm wrote:
           | Consider the case of Partials - we might like to restrict the
           | Partials to different subsets of the fields for different
           | purposes; consider the modelling of an inverted index.
           | 
           | Certainly it is possible to represent each specific case as
           | some algebraic type; but beyond trivial cases, I find that
           | when I need such of these types, quickly I discover that
           | there are myriad ways to express them, none of them uniquely
           | natural, unlike the way a sum of products type (and its
           | terms) can be pretty much unambiguously drawn from a
           | specification.
           | 
           | This matters especially when e.g. I need to evolve my types
           | in a data migration.
        
       | criloz2 wrote:
       | Graph drawing tools are also very underwhelming, they work pretty
       | good for small graphs until you have something like 500 nodes or
       | more, then eventually their output becomes complete
       | incompressible or very difficult to look at it, they miss the
       | ability to automatically organize those graph in hierarchical
       | structures and provide a nice interface to explore them, we are
       | used that everything around us have some kind of hierarchy, I
       | think that is the same kind of problem that will need to be
       | solved in order to have a generic graph data type, also this kind
       | of thing will need to be implemented at the compiler level where
       | those graph generic algos will be adapted to the generated
       | hierarchy of structures, and if you add a theorem prover that can
       | check that certain subgraph will always have certain structures
       | you can statically generated those procedures and for the other
       | super graphs those methods will be generated dynamically at
       | runtime.
       | 
       | So whoever solve the problem for generic graph drawing will have
       | the ability or the insight to implement this too.
        
         | hobofan wrote:
         | > we are used that everything around us have some kind of
         | hierarchy
         | 
         | I think the problem is more that we are used to the
         | illusion/delusion that everything is hierarchical. The problem
         | that we then encouter is with graph drawing is that it has to
         | try and reconcile the fact that things in practice are rarely
         | really hierarchical, and it's hard to draw those lines of where
         | the hierarchies are with mathematical rigor. And that problem
         | gets worse and worse the less properties you are allowed to
         | assume about the underlying graph structure (connectedness,
         | cyclic/acyclic, sparse/dense).
         | 
         | In practice when you want build a UI that interacts with graphs
         | it's often feasible to determine/impose one or two levels of
         | meta-hierarchy with which you can do clustering (allows for
         | reducing layout destroying impact of hairball nodes + improves
         | rendering performance by reducing node count) and layout with
         | fCOSE (Cytoscape.js has an implementation of that).
        
         | nine_k wrote:
         | Ideal things are often tree-like. Real-world structures are
         | usually DAGs if they are nice and well-behaved.
         | 
         | Making things planar, or almost planar with few crossings and
         | nice clustering of related nodes, is usually hard past a couple
         | dozen nodes :(
        
         | swagasaurus-rex wrote:
         | The illustrations I've seen of neural networks really
         | highlights the sheet incomprehensibility of visualizing large
         | graphs
        
           | kjqgqkejbfefn wrote:
           | https://www.microsoft.com/en-us/research/wp-
           | content/uploads/...
        
         | kjqgqkejbfefn wrote:
         | >Graph drawing tools
         | 
         | It's hard
         | 
         | Graphviz-like generic graph-drawing library. More options, more
         | control.
         | 
         | https://eclipse.dev/elk/
         | 
         | Experiments by the same team responsible for the development of
         | ELK, at Kiel University
         | 
         | https://github.com/kieler/KLighD
         | 
         | Kieler project wiki
         | 
         | https://rtsys.informatik.uni-kiel.de/confluence/display/KIEL...
         | 
         | Constraint-based graph drawing libraries
         | 
         | https://www.adaptagrams.org/
         | 
         | JS implementation
         | 
         | https://ialab.it.monash.edu/webcola/
         | 
         | Some cool stuff:
         | 
         | HOLA: Human-like Orthogonal Network Layout
         | 
         | https://ialab.it.monash.edu/~dwyer/papers/hola2015.pdf
         | 
         | Confluent Graphs demos: makes edges more readable.
         | 
         | https://www.aviz.fr/~bbach/confluentgraphs/
         | 
         | Stress-Minimizing Orthogonal Layout of Data Flow Diagrams with
         | Ports
         | 
         | https://arxiv.org/pdf/1408.4626.pdf
         | 
         | Improved Optimal and Approximate Power Graph Compression for
         | Clearer Visualisation of Dense Graphs
         | 
         | https://arxiv.org/pdf/1311.6996v1.pdf
        
       | rhelz wrote:
       | Ya, the central obstacle is that:
       | 
       | 1. for simple and small graph problems, a simple vector-of-
       | vectors adjacency list is easy enough to code up.
       | 
       | 2. For complex and huge graph problems, the only way to get
       | performant solutions is to tailor the graph implementation to the
       | specific details of the problem to be solved.
       | 
       | And its hard to see what kind of language support would help,
       | other than just having a super-smart compiler which could analyze
       | the code and determine whether an adjacency list, matrix, 3d
       | array, etc was the best way to implement it. That's the kind of
       | optimization which we won't see in compilers for a while.
       | 
       | It's another instance of the phenomenon which Strousroup noticed:
       | we are really good at code sharing of small things like vectors,
       | and of large things like operating systems. Its the middle-sized
       | problems we are bad at.
        
         | twoodfin wrote:
         | _And its hard to see what kind of language support would help,
         | other than just having a super-smart compiler which could
         | analyze the code and determine whether an adjacency list,
         | matrix, 3d array, etc was the best way to implement it. That 's
         | the kind of optimization which we won't see in compilers for a
         | while._
         | 
         | I'm not so sure? Looking at an algorithm against an abstract
         | graph type, then filling in the implementation to optimize for
         | the particular algorithm seems right in the wheelhouse of code-
         | specialized LLM's.
        
           | rhelz wrote:
           | Good point. The game has really changed in terms of what
           | kinds of programs we can write now. Perhaps it's too
           | pessimistic to not expect these sorts of optimizing compilers
           | soon.
           | 
           | Sounds like a good research opportunity to me.
        
           | rocqua wrote:
           | My experience with cipher is that the query optimizer doesn't
           | know enough about the graph to pick up on trivial
           | optimizations. This can't be fixed without a way to tell the
           | optimizer about those properties, and even just dreiging a
           | language to tell the optimizer those things is difficult.
           | 
           | Just an LLM looking at your query isn't going to cut it. It
           | will need to take your actual data into account.
        
         | js8 wrote:
         | > we are really good at code sharing of small things like
         | vectors, and of large things like operating systems. Its the
         | middle-sized problems we are bad at.
         | 
         | Interesting. But I am not sure we are good at sharing small
         | things - every programming language has its own implementation
         | of vectors. Within one language ecosystem, the API of a vector
         | is small, and that's probably what makes it easy to share.
         | 
         | For operating systems, the API is relatively small compared to
         | the internal complexity of the OS. This is also true for
         | libraries for numerical problems, which are also easily shared.
         | But the more you want to customize things (e.g. share a
         | complicated data structure), this complicates the API and
         | inhibits sharing.
         | 
         | So it seems to this is determined by the surface area (relative
         | size of the API) of the thing being shared.
        
       | avgcorrection wrote:
       | Maybe there could be some utility for a decently general Graph
       | type that can be used for high-level testing and verification.
       | Maybe you could implement your efficient graph per-problem and
       | then validate it with a more high-level and declarative type. You
       | would have to implement some isomorphism between the types
       | though.
        
       | graphviz wrote:
       | Graphviz has its own foundation graph library, that's not used by
       | any other project. It has its good and bad points.
       | 
       | Based on that experience, we had our very own second-system-
       | syndrome experience.
       | 
       | We decided our graph library should be modular, type safe, and
       | efficient. (These properties came up in the comments here, too.)
       | This is probably just a variation of "good, fast, cheap - pick
       | any two."
       | 
       | By modular, want to write collections of graph algorithm
       | libraries that are developed and even compiled independently.
       | 
       | By type safe, we mean we want to detect programming errors during
       | compilation, or link time at the latest. We don't want programs
       | to throw runtime errors like "your node does not have a color
       | attribute".
       | 
       | By efficient, we mean that accessing an attribute of a graph is
       | as cheap as a field in a C struct. (So, we don't want to carry
       | around external hash table, or do a lot of string conversions,
       | for instance.)
       | 
       | You can argue about whether these things are worth the price or
       | even make sense, but that's what we wanted. We had some famous
       | C++ creators in our lab, so we figured we could get help, and we
       | were willing to give C++ another chance.
       | 
       | Gordon Woodhull, who had been an intern and kept working for us,
       | is a brilliant programmer, and wrote an implementation of this
       | kind of graph library working in templated C++. It's even
       | published at https://www.dynagraph.org/ via sourceforge. The rest
       | of us were not really sure we could ever understand how the code
       | worked, so we had a code review with said famous C++ inventors.
       | There were a lot of screens of code, and chinstroking, and
       | greybeards pronounced "That would probably work." We knew we
       | might have gone over the complexity cliff. (Let's not even talk
       | about compile-time template errors, where one error fills an
       | entire screen with details that only a... C++ inventor could
       | love.) It was our fault, not anyone else's, and Gordon kept
       | plugging away and even made all the dynamic graph layout stuff
       | work, in Microsoft OLE, too. In hindsight it was probably our own
       | Project Xanadu. While we got lost in this, a lot of things like
       | Gephi (Java) and NetworkX and NetworKit (python) happened.
       | 
       | Also, John Ellson, a very talented software engineer who had
       | written parts of Graphviz, revitalized the main effort.
        
         | transitionnel wrote:
         | That _all_ sounds fantastic!
        
       | bionhoward wrote:
       | Could dataframes be a superior backend for graphs to maps/lists?
        
       | skrebbel wrote:
       | I think most languages support representing many kinds of graphs
       | very well, particularly directed graphs without data on the
       | edges: with objects, lists, and object references (or pointers).
       | 
       | A tree is a graph. A typical Java-style object composing other
       | objects composing other objects again, etc etc, often with cycles
       | and parent backreferences and whatnot, is a graph. The html DOM
       | is a graph.
       | 
       | I recognize that these are often very tree-like, which feels like
       | cheating in the same way as saying "well a list is also a graph!"
       | is. But given that cycles are common enough that serializers (eg
       | JSON.stringify) need to special-case those, I think maybe this is
       | simply not true, and they're really just graphs. Very few tree-
       | like class structures tend to remain pure trees.
       | 
       | The only thing missing from references/pointers to be able to
       | represent what the author is looking for, is having data on the
       | edges. I think this is trivially solvable by putting nodes
       | halfway the edge (= add a level of indirection, an operation so
       | common that we don't even think of it as "adding data to the
       | edges").
       | 
       | So I think the answer is that there's no explicit data structure
       | named "graph" because the basic building block of composition in
       | nearly every language (reference/pointer) is an edge, and the
       | basic building block of data representation
       | (objects/structs/records) is a node. So for most graphs, trying
       | to pour it all into some fancy Graph<V, E> datastructure feels
       | like needless complexity.
        
         | e_y_ wrote:
         | Back in the C days, it was common to not use generic data
         | structures like lists; instead you'd have a next_item pointer
         | as a field in the struct. For linked lists, this would save you
         | from needing another memory allocation or wrapper struct, and
         | since C doesn't have templates you'd either have to blindly
         | cast the type or use macros or write a type-specific iterator
         | anyways.
         | 
         | Lists eventually became a standard language feature in C++ and
         | other languages, but it's trickier for trees and graphs. Taking
         | the DOM example, you might be searching through child elements
         | (div, span, etc) or nodes (text nodes, comment nodes) and
         | different operations might only work with a specific subset of
         | the "edges". There might be pointers to other objects, like
         | from a DOM node to accessibility tree node. You might even have
         | multiple parent node pointers, such as a pointer that takes you
         | to the nearest shadow root or something.
         | 
         | Since there are multiple ways to traverse the same data
         | structure, generic functions don't work on it. You could create
         | a separate tree/graph for each thing you want to use it for,
         | but that takes additional memory and has to be updated when the
         | original struct changes. Or you could create some kind of
         | adapter that has a get_edges() function, but this might not be
         | very well optimized or might be clunky for many other reasons.
         | So it usually just ends up being simpler rolling your own
         | functions instead of using a library.
        
           | actionfromafar wrote:
           | Bonus points for not allocating space for the next pointer if
           | you didn't plan to use it...
        
       | ogogmad wrote:
       | I suspect a lot of graph theory can be reduced to abstract
       | algebra. You consider matrices over lots of different scalar
       | types. The scalar types should be:
       | 
       | - Rigs (rings without negation),
       | 
       | - idempotent (that is, where x + x = x for all x),
       | 
       | - equipped with involution (so that undirected graphs can be made
       | the default by restricting the matrices which represent graphs to
       | only self-adjoint matrices),
       | 
       | - and the entries of the matrix can be restricted to a *-ideal.
       | Note that a *-ideal can be considered a scalar type in its own
       | right.
       | 
       | Different choices of the above scalar types can be used to
       | capture different graph types: Weighted, directed, undirected,
       | bipartite.
       | 
       | There's no clue to physical implementation, other than that
       | sparse graphs can be treated like sparse matrices. Anybody tried
       | this? How did it work out?
        
         | bigbillheck wrote:
         | You're a couple decades too late: https://graphblas.org
        
           | ogogmad wrote:
           | If you read what I wrote, you'd see I never claimed
           | originality. Anyway, that's cool. I'll need to check how much
           | abstract algebra stuff they use.
           | 
           | Using semirings (uh, rigs) alone isn't impressive. Do they
           | consider semirings with more algebraic structure attached to
           | them?
        
             | bigbillheck wrote:
             | > Using semirings (uh, rigs) alone isn't impressive. Do
             | they consider semirings with more algebraic structure
             | attached to them?
             | 
             | Wiki says they have R, the two tropical semirings, the
             | 'max-min' semiring, and GF(2). The tropical and max-min
             | have your idempotency requirement, all but the max-min have
             | involution.
        
       | lowbloodsugar wrote:
       | Couple of thoughts:
       | 
       | First, the discussion about representation highlights that the
       | issue is a lack of infinite resources. If we had an infinite
       | computer, that could execute an infinite number of operations in
       | zero time, and had infinite memory, then we wouldn't be worrying
       | about whether it's better to store the graph as a matrix, an edge
       | list, or a pointer graph.
       | 
       | Software Engineering is everywhere and always a job of
       | optimization. Sometimes that optimization is premature, and
       | sometimes it's too little too late. It's always about
       | optimization.
       | 
       | Second, when we're talking about a graph of 10 nodes, it really
       | doesn't matter what data structure we use. It can quickly matter
       | if we have 100s or 1000s of nodes and edges because now the
       | possible arrangements are huge as is the search space. But this
       | is no different than other problems like the knapsack problem
       | where the search space is huge: depending on the problem, there
       | is very likely a "trick" to make it tractable, and that trick is
       | different depending on the problem.
       | 
       | So, like the knapsack problem, there are different, specific
       | solutions for specific graph problems.
        
       | csb6 wrote:
       | This is an interesting article but one major effort it doesn't
       | mention is the Boost Graph Library [0].
       | 
       | It is kind of clunky in places because it is written in C++03 and
       | uses some weird idioms to simulate keyword arguments and provide
       | generic ways of getting attributes for nodes. Also it suffers
       | from the terrible template instantiation errors that most C++
       | template libraries do. But I still think it addresses a lot of
       | the difficulties covered in the article:
       | 
       | > There are too many design choices
       | 
       | BGL is limited to directed/undirected multigraphs, so hypergraphs
       | are not supported. However I think these cover most use cases.
       | 
       | In terms of implementation choices, BGL provides several concrete
       | data types, such as an adjacency list and an adjacency matrix. It
       | also provides adaptors for the GraphBase and LEDA graph
       | libraries. If none of these are suitable you can write adaptor
       | functions to support your custom data type. All algorithms* work
       | unmodified on these concrete implementations.
       | 
       | > So which algorithms should come with the library?
       | 
       | BGL comes with most of the common ones [1], but I do wish it came
       | with more. The implementations of them are quite hard to read
       | because they are written in highly generic (and before many of
       | the conveniences offered in C++11) C++ code.
       | 
       | > Performance is too important
       | 
       | Since BGL is generic using C++ templates instead of runtime
       | polymorphism, it should (in theory) be able to work with a
       | concrete graph implementation that is performant for a certain
       | task and so let you reuse its generic algorithms.
       | 
       | I think the article describes a lot of the difficulties that
       | Stepanov's generic programming approach tries to solve (e.g.
       | finding the most abstract but still efficient implementation of
       | an algorithm, writing algorithms that depend on a limited set of
       | type requirements, having many data types that can reuse the same
       | algorithms). While C++ supports this style of programming it is
       | not ideal for it, but I think BGL is the closest thing I have
       | seen to a generic graph library that is also performant in many
       | cases.
       | 
       | *Algorithms have varying requirements, e.g. some may need to be
       | able to remove edges while others do not. But these requirements
       | are generic and can be fulfilled by many different graph
       | implementations.
       | 
       | [0]
       | https://www.boost.org/doc/libs/1_84_0/libs/graph/doc/index.h...
       | 
       | [1] section 22,
       | https://www.boost.org/doc/libs/1_84_0/libs/graph/doc/table_o...
        
         | csb6 wrote:
         | As a sidenote, Stepanov's introduction [0] to the Boost Graph
         | Library book explains a lot about his approach to programming.
         | He didn't write the BGL, but his work on the STL was a major
         | inspiration for it.
         | 
         | [0] http://stepanovpapers.com/siekforeword.html
        
       | rdtsc wrote:
       | > There's a gap between how often software engineers could use
       | graphs and how little our programming ecosystems support them.
       | Where are all the graph types?
       | 
       | They've been there for quite a while :-)
       | https://www.erlang.org/doc/man/digraph.html
       | https://www.erlang.org/doc/man/digraph_utils
       | 
       | And if you want to do some set theoretical stuff you're covered
       | as well: https://www.erlang.org/doc/man/sofs.html
        
       | Borg3 wrote:
       | Ahh, Graphs.. One of my favorite subject. I love doing graphs but
       | im kinda bad at implementation. But, I still managed to slap D3 +
       | Cola to make my own little interactive visualizer I use for
       | various networking visualizations (L1,L2,L3).
       | 
       | Here is example of IRCnet network:
       | 
       | http://ds-1.ovh.uu3.net/~borg/d3/ircnet.htm
        
       | jkaptur wrote:
       | I've often wondered about a missing application: "Excel for
       | graphs".
       | 
       | Just like Excel for tabular data, it would support RAM-sized data
       | (enough to require a computer, but not so much that you need a
       | data center), implement lots of algorithms and visualizations
       | "well enough", and require no programming skill to operate.
       | 
       | As the article says, a lot of our real-world problems are _graph
       | problems_ - why are programmers the only ones who should have the
       | tools to solve them?
        
         | empiko wrote:
         | Yeah, I feel like the article is too quick with its
         | conclusions. Many other problems can be made arbitrary complex
         | and difficult with additional requirements. But there are still
         | data structure and standard libraries to provide good enough
         | experience that fits most use-cases. And if you need something
         | extra spicy you need to build a custom solution.
         | 
         | The article claims that graphs are often just too big, but
         | yeah, if you ask people who are actively working on graph
         | algorithms they might have that sort of experience. But most
         | programmers and users probably only work with really small
         | graphs.
        
       | caditinpiscinam wrote:
       | > Relational databases are graphs where the nodes are records and
       | the edges are foreign keys
       | 
       | I disagree with the premise of the article: programming languages
       | _do_ have strong and mature support for graphs in the form of
       | relational database interfaces, which cover most of the real-
       | world use-cases for linked data.
        
       | tunesmith wrote:
       | Given the wide variety of implementations, what I'd like is a
       | questionnaire that asks you what sort of graph you are intending
       | to work with, and then recommends what sort of algorithm
       | implementations or software packages would be best... I'm hoping
       | that the language models will get better at this sort of question
       | over time.
        
       | ylow wrote:
       | I think this is because a graph is not a data-structure nor a
       | data-type. It is really an abstraction.
       | 
       | Fundamentally, _all_ I need to define a graph is a set of
       | vertices v \in V and function Neighbors(v). And that really is
       | all is needed for the most foundational set of graph algorithms.
       | 
       | Everything else are case-by-case constraints. Does A->B imply
       | B->A? is the node set partitionable with certain constraints? Are
       | there colors? labels?
       | 
       | To make things _even_ more general I can go up one level and
       | consider the hypergraph. In which case I just have a set of
       | vertices, and a set of sets of vertices. This can be represented
       | in a myriad of different ways depending on what you are
       | interested in. Of which (non-hyper) graph is simply a special
       | case.
       | 
       | An alternative way to think about it perhaps from the database
       | perspective, is that its a query optimization and indexing
       | problem. Depending on what questions you want to ask about the
       | graph, there will be different ways to represent the graph to
       | answer the question better. Just like there is not one way to
       | represent the abstraction called "Table", there is not one way to
       | do "Graph" either. It really depends on the questions you care
       | about.
        
         | gloftus wrote:
         | Yes, graphs are ubiquitous because they are so abstract. They
         | live on the same level of abstraction as pure numbers. There
         | are useful "numerical" libraries that exist, and by analogy I
         | think you could say there are also useful "graphical" libraries
         | that exist. But we don't really have "number" libraries, and we
         | don't really have "graph" libraries, because those concepts are
         | a bit too abstract to write APIs against.
        
           | kragen wrote:
           | it's true that numbers are very abstract, which is what makes
           | it so easy to design apis for them
           | 
           | the python runtime includes four built-in number types (small
           | integer, arbitrary-precision integer, float, and complex) and
           | the python standard library includes two more number types
           | (decimal and fractions), and one of the most popular non-
           | standard libraries for python is numpy, which provides some
           | other kinds of numbers such as single-precision floats,
           | vectors, and matrices. other systems like pari/gp have number
           | libraries that provide other kinds of numbers, such as p-adic
           | numbers and galois field elements
           | 
           | the only programming languages i've ever used that didn't
           | have 'number' libraries were esoteric languages like
           | brainfuck and the lambda calculus
        
       | fatherzine wrote:
       | What an awesome article. Kudos to the author!
       | 
       | On the core observation "there are too many implementation
       | choices", that is not quite right. True, the author mentions 4,
       | and there are further variations. In practice, a library can:
       | 
       | 1. Implement _all_ suitable graph representations.
       | 
       | 2. Implement algorithms tailored to the representation(s) that
       | offer the highest performance.
       | 
       | 3. Provide _transformations_ from one representation to another.
       | This is O(#representations), trivial to implement and trivial to
       | use. Quite fair workload for both maintainers and users.
       | 
       | 4. Bonus, provide import / export _transformations_ from  / to
       | common standard library datatypes and idioms.
       | 
       | Memory and transformations are cheap, 99% of use-cases would
       | likely find the overhead of transforming data, both in RAM and
       | CPU, negligible.
       | 
       | Edit: "the harsh truth of working at Google is that in the end
       | you are moving protobufs from one place to another." --
       | https://news.ycombinator.com/item?id=20132880
        
         | josephg wrote:
         | Sounds like the makings of a huge library that I'm not sure I'd
         | even use in my work. I use graphs _heavily_ in my work, and my
         | experience matches the people the author interviewed.
         | 
         | We always end up reimplementing graphs because:
         | 
         | - Performance matters, and no off the shelf graph library I've
         | seen can take advantage of many of the regularities in our
         | particular data set. (We have an append-only DAG which we can
         | internally run-length encode because almost all nodes just have
         | an edge pointing to the last added item).
         | 
         | - I haven't seen any generic graph library which supports the
         | specific queries I need to make on my graphs. The big one is a
         | subgraph diffing function.
         | 
         | - Writing something custom just isn't much work anyway! Graphs
         | are way simpler to reimplement than btrees. You can have a
         | simple graph implementation in tens of lines. Our highly
         | optimised library - with all the supporting algorithms - is
         | still only a few hundred lines of code.
         | 
         | I think it would be handy to have ways to export the data into
         | some standard format. But eh. I think pulling a library in for
         | our use case would add more problems than it would solve.
        
       | fiddlerwoaroof wrote:
       | FWIW, despite its name being an abbreviation for LISt Processing,
       | the fundamental datatype of lisp (the cons cell) is actually a
       | vertice on a directed graph with the restriction that each
       | vertice can only have two outgoing edges.
        
       | AtlasBarfed wrote:
       | My own take: a document database, which devs seem to love,
       | combined with relations aka edges,is basically a property graph
       | and that's what basically you need. Directionality is a property
       | on the edge.
        
       | ChicagoDave wrote:
       | I've been building an interactive fiction platform built in C#
       | with a world model contained within a bidirectional graph data
       | structure.
       | 
       | Since IF stories are relatively small in graph terms, it's a
       | reasonable solution.
       | 
       | Locations and objects are nodes and movement and location of
       | objects are edges. Nodes and edges both can have dynamic
       | properties.
       | 
       | I've also noticed the lack of graph structures in programming
       | languages, so this article was very enlightening.
        
       | zacify wrote:
       | I feel like this is common knowledge....
        
       | js8 wrote:
       | Another data type that would be quite useful is a table (like in
       | a database). For the same reasons, too many design choices.
       | 
       | Anyway, that being said, I have felt that progress will be made
       | in programming languages if the compiler gets to choose an
       | implementation of a data structure, kinda like when a database
       | chooses an execution plan. So you just use an abstract structure
       | (like sequence, map, set, table, graph) and based on the program
       | profile, the compiler will pick the specific implementation. It
       | will also transform the structure into another isomorphic one as
       | needed. (Some programming languages already do something like
       | this, for example, array of structs to struct of arrays
       | conversion.)
        
       | khanguy wrote:
       | I'm surprised I haven't seen anyone mention Stanford's SNAP [0]
       | yet. In addition to their C++ library, they also have a Python
       | interface too.
       | 
       | [0] https://snap.stanford.edu/
        
       ___________________________________________________________________
       (page generated 2024-03-04 23:00 UTC)