[HN Gopher] The hunt for the missing data type
___________________________________________________________________
The hunt for the missing data type
Author : todsacerdoti
Score : 406 points
Date : 2024-03-04 16:42 UTC (6 hours ago)
(HTM) web link (www.hillelwayne.com)
(TXT) w3m dump (www.hillelwayne.com)
| pizlonator wrote:
| I would claim that object oriented languages are a syntax,
| semantics and type system for graphs.
|
| Objects are nodes.
|
| Fields are edges.
|
| The object graph is the heap.
|
| So your whole program state is a graph.
| gwbas1c wrote:
| I was about to post "Data structures are graphs," but your
| explanation says it better.
| reaperman wrote:
| I think this is a useful model for me personally, and don't
| want to diminish its potential value to others; I often think
| of programs as graphs.
|
| I think it's interesting to add to the discussion that I'm wary
| to reduce anything to any particular "Turing-complete concept".
| Because anything can be represented by anything.
| qsort wrote:
| Graphs are such a general concept that if you squint everything
| is a graph. Your example works just as well with structs and
| member fields, we don't even need the OO hypothesis.
| pizlonator wrote:
| C and structs let you write code using whatever paradigm you
| want. So you can do object graphs. And you can also do
| closures. And you can do many other things.
|
| Lots of things are possible when you just treat memory as
| bytes and pointers are just integers.
| chubot wrote:
| Right, I think that's just what I said the first time this came
| up: all languages with GC have graphs builtin.
|
| (Though C has graphs too. If every node shares the same
| lifetime, then it's pretty easy to manage. Otherwise it can be
| pretty painful)
|
| And the good news is that you simply use the TYPE SYSTEM to
| categorize your nodes and edges.
|
| Your edges are references to other objects, which are typed.
| Node can be typed as well.
|
| ---
|
| Although the original article does get at this -- there are
| many types of graphs, and some of them can be encoded in a
| typed object graph.
|
| Some of them can't -- you need the equivalent of void* for the
| edges.
|
| Others would need a List[T] for the edges, if the out degree is
| not fixed.
|
| And that only covers directed graphs, etc.
|
| Also, it's true that allocating all these tiny objects as GC
| objects can be very slow, so then you use other representations
| of graphs, like a list of pairs of node IDs.
|
| I don't really think of it as a "missing" data structure, but
| yeah now I do see how that framing can be useful.
| tlb wrote:
| A limitation is that you can only have one such graph in the
| program. So any graph algorithm that returns a graph, or uses
| another graph during the computation, doesn't fit.
| boothby wrote:
| Don't fall for the abstraction! Mathematically speaking, if
| you have graphs G1 and G2; you can make another graph H with
| nodes {G1, G2} and edges {(G1, G2)} and nothing goes wrong.
| You can definitely view your whole operating system as a
| graph; it doesn't invalidate having a program running which
| processes graphs.
| PaulHoule wrote:
| For that matter it's true about a certain kind of C program. In
| a civilized language you have run-time typing and introspection
| and have all the type information so that it is straightforward
| to look at the program's data structures as a graph.
|
| If you are looking at the debug symbols and a copy of the
| program you can usually figure this graph out but you might
| need to think about it sometimes.
| dustingetz wrote:
| let x = 1; // edge named x let y = f(x); // node named
| f with input edge x and output edge y
| obi1kenobi wrote:
| As someone who did a lot of work with graphs, "why don't
| programming languages have a built-in graph data type?" is a
| question I've been asked a million times.
|
| I'm thrilled I'll be able to point folks to a much more in-depth
| analysis like the one here, instead of saying some variation of
| "it's really hard to do it well" and having them just take my
| word for it.
| jjice wrote:
| I used to think that since graphs are such a broad data
| structure that can be represented in different ways depending
| on requirements that it just made more sense to implement them
| at a domain-ish level (the article mentions this in the "There
| are too many implementation choices" section).
|
| Then I saw Petgraph [0] which is the first time I had really
| looked at a generic graph library. It's very interesting, but I
| still have implemented graphs at a domain level.
|
| [0] https://github.com/petgraph/petgraph
| nickpsecurity wrote:
| You might say graphs are an abstract concept, solving our
| problems requires specialized graphs, and so we just use or
| build the kind we need.
| sanderjd wrote:
| Yeah this is exactly how I think of it. I think "graphs"
| are just at a different level of the abstraction hierarchy
| than the data structures they're often grouped together
| with.
|
| This is even further up the abstraction hierarchy, but to
| illustrate the point, nobody really wonders why languages
| don't ship with a built-in database implementation. And
| it's the same basic reason as with graphs; one size doesn't
| fit most.
| TachyonicBytes wrote:
| Why wouldn't an abstract `Graph` type and specific
| implementations of that work? Like Java has for `Set` and
| various implementations?
| gryn wrote:
| even at that level of abstraction there are many flavors of
| "graphs". what some people call graph are actually
| hypergraphs, or attributed graphs.
| skybrian wrote:
| For importing and exporting data, it often makes more sense
| to use something like a table or a tree rather than a
| graph. (Like a CSV file or a JSON file.)
|
| So it's not clear what the interface would do. What methods
| should there be? Again, there are too many choices, and a
| Graph interface often isn't the best way to represent a
| view of some subset of a graph.
| somat wrote:
| This is a super naive take. but I would consider the pointer
| the be the native graph type. What is wanted is not a graph
| type but the tooling to traverse graphs.
| alanbernstein wrote:
| Maybe a pointer is better described as equivalent to a half-
| edge?
| andyferris wrote:
| I think this might be kinda right.
|
| Take rust. Tree edges use Box, and DAG edges use Arc or Rc.
| Rust doesn't really help with building non-DAG graphs as
| native data types - you can drop back to using "handles"
| for more generic graphs.
|
| And I actually think that is kinda fair. Languages and
| standard libraries should support trees and DAGs out of the
| box. Perhaps Hillel is right that more complex graphs
| should be in a user-space library or something.
| nine_k wrote:
| It's a bit like saying that a byte is a native number type,
| providing comparison to zero and addition as the only
| operations, and asking to figure the rest yourself.
|
| I mean, it's _technically_ sufficient, but I 'd not call it
| proper support on a language level.
| cwillu wrote:
| Period should probably be a comma; I read this comment a
| couple times before it made sense, rather than calling the
| grand parent super naive.
| boothby wrote:
| I actually brought this hot take up in my conversation with
| Hillel -- something along the lines of "void * * is the
| canonical native graph representation in C."
| computerdork wrote:
| Hmm, think you're taking the title of the article a little
| too literally, and focusing on just the "data type" aspect.
| Yeah, the article itself indirectly agrees with you that
| pointers are often how graphs are implemented: "Graphs are a
| generalization of linked lists, binary trees, and hash
| tables," which are data-structures often implemented by
| pointers.
|
| Yeah, the article is really saying what you're saying, that
| building a graph library ("tooling") is very complicated.
|
| But, maybe I'm misunderstanding what you're saying??
| paulddraper wrote:
| > "it's really hard to do it well"
|
| More importantly, there are a lot of tradeoffs.
|
| Virtually every language offers a hash map. You can roll your
| own to outperform in an individual circumstance, the default
| works pretty well.
|
| You can't really do that with a graph. Maybe if you offered a
| bunch of graph types.
|
| ---
|
| PS. Bit of trivia: Java's HashMap is a bit different from
| almost every other language in that it lets you tune the load
| factor.
| sltkr wrote:
| > Java's HashMap is a bit different from almost every other
| language in that it lets you tune the load factor.
|
| I'm not sure that's particularly unusual. For example, C++
| supports this too:
|
| https://en.cppreference.com/w/cpp/container/unordered_map/ma.
| ..
| adastra22 wrote:
| > You can't really do that with a graph. Maybe if you offered
| a bunch of graph types.
|
| And so why isn't this the solution?
|
| Most languages support both hash map (fast lookup) and
| balanced tree (ordered entries) primitives, even though they
| both implement the "associative map" container type.
|
| Can't we have 2, 3, 5, or even 8 different graph types?
| masklinn wrote:
| > Can't we have 2, 3, 5, or even 8 different graph types?
|
| That's what graph libraries do, look at petgraph. You've
| got a bunch of graph implementations, and a bunch of
| algorithms over them.
| karmakaze wrote:
| It's also different in that each hash bucket can use a Red-
| Black tree when there's many entries.
| sayrer wrote:
| I think it's because you have to think it through when they get
| too big for one machine. A lot of those so-called "NoSQL"
| databases are really meant to represent graph databases (think
| DynamoDB) in an adjacency list. I've had success with Gremlin
| and JanusGraph, but it's a really messy space. It's not a
| programming language problem in my opinion, but a distributed
| systems one.
| gilleain wrote:
| > And then for each of those types we have hypergraphs, where an
| edge can connect three or more nodes, and ubergraphs, where edges
| can point to other edges.
|
| Huh, I've heard of hypergraphs (although never actually really
| used them) but never an 'ubergraph'. Sounds tricky!
|
| In practice, how often are there situations you definitely need
| hypergraphs? I had a particular situation where I needed graphs
| that were both vertex coloured (labelled) _and_ edge coloured
| (labelled) - even then it was outside the normal situation for
| what I was doing (graph canonicalization).
| rb-2 wrote:
| I wonder if it would be possible to mathematically define (in a
| theorem proving language like Coq) a bunch of accessor methods as
| well as a bunch of implementation primitives and then "compile" a
| custom graph implementation with whatever properties you need for
| your application. Some accessor methods will be very efficient
| for some implementations and very inefficient for others, but
| every method will still be available for every implementation.
| Profiling your application performance can help adjust the
| implementation "compiler" settings.
|
| Ironically, this is a graph problem.
| kevindamm wrote:
| This sounds like Partial Evaluation and the Futamura
| Projection. The research around that shows that your
| interpreter determines the shape of the compiled output, so a
| formal proof of its application isn't necessary, if the
| $mix$-equivalent has the appropriate syntax and semantics for
| graph processes in its design.
|
| I know this has been done for procedural languages and for
| declarative logical languages but I'm not aware of something
| like this specifically for graph processing and highly
| specialized code generation of graph processing. I wouldn't be
| surprised if Mix has been extended for this already, even if it
| has I'm sure there is still value in it.
| JonChesterfield wrote:
| I think this is a worthwhile direction.
|
| For example, I'd like to program against a sequence
| abstraction. When sort is applied to it, I hope it's a vector.
| When slice or splice, I hope it's some sort of linked
| structure. Size is as cheap as empty for the vector but much
| more expensive for a linked list.
|
| It should be possible to determine a reasonable data
| representation statically based on the operations and control
| flow graph, inserting conversions where the optimal choice is
| different.
|
| The drawback of course is that people write different programs
| for different data structures. Knowing what things are cheap
| and what aren't guides the design. There's also a relinquishing
| of control implied by letting the compiler choose for you that
| people may dislike.
|
| As an anecdote for the latter, clojure uses vectors for lambda
| arguments. I thought that was silly since it's a lisp that
| mostly works in terms of seq abstractions, why not have the
| compiler choose based on what you do with the sequence? The
| professional clojure devs I was talking to really didn't like
| that idea.
| andoando wrote:
| Ive been thinking about something like this. A mathematical
| definition of a function such that we can search it. Imagine we
| had something like "Find a function that fits this signature ->
| Input arr[numbers] out-> for every x in arr, x2>x1.
| lupire wrote:
| That's https://hoogle.haskell.org/ plus dependent types (data
| constraints).
|
| Without human provided dependent typing, the search engine
| would be almost as hard to write as a system to directly
| generate the code you need.
| jes5199 wrote:
| I think this is a cop-out. Someone in the 1990s could have
| written the same thing about collections, or dictionaries, but we
| eventually came up with good-enough compromises that Python,
| Ruby, and Javascript all basically do the same thing. They don't
| implement literally every case, but they are good enough for
| "small" data, where the definition of "small" has grown to be
| quite large by human standards.
|
| I think the real problem is syntax. Someone needs to come up with
| a textual graph literal that fits in source code, and it'll be
| good enough for the small type. Switch to a custom ubergraph when
| you exceed, idk, ten million entries.
| gilleain wrote:
| What do you mean by "textual graph literal"?
| jes5199 wrote:
| Textual array literal: [1,2,3]
|
| Textual dict literal: {"a": 1}
|
| Textual graph literal: ???
| gilleain wrote:
| Heh, one possibility is:
|
| AB 0:1(C),0:2(D)
|
| For a three node graph with edges between vertex 0 and 1,
| and vertex 0 and 2, vertex labels 'A' and 'B' and edge
| labels 'C', and 'D'. Not great to parse (as I sadly found),
| but possible to read.
| vonwoodson wrote:
| Textual graph literal: {"a->b", "b->c", "c->a"}
|
| Thinking about the programming language DOT https://en.wiki
| pedia.org/wiki/DOT_(graph_description_languag...
|
| This may be an effective way to express these graphs.
| jes5199 wrote:
| I like DOT a lot. Way back in the day we had a semi-
| serious proposal to use it in the Puppet configuration
| language to describe dependencies
| Groxx wrote:
| Same - DOT is a great way to go from zero to functional
| in near minimal time. It's also pretty trivial to
| generate since it's not order dependent, which is great
| for lightweight automation.
|
| Fine-tuning layout can be a real hassle though, sadly. I
| haven't found any quick tools for that yet.
| kevindamm wrote:
| The way I approach fine-tuning DOT layout is to add
| subgroups where it seems appropriate, add a 1px border
| for the subgroup and see where the layout engine is
| situating it next to other nearby vertices/edges.
| Sometimes I may have to put a border around a few
| subgroups, then attempt to adjust size of vertices and
| entire area to nudge it to a local minima. Note: I don't
| attempt to adjust the size of the subgroups, I'm not sure
| that even works anyway, but maybe it depends on the
| choice of layout algorithm, too. Padding and other style
| on the vertex-box may help, too. It's been a few years
| for me, tbh.
|
| Deciding where the appropriate subgroups are is a bit of
| an art. Sometimes it's obvious, as in bipartite graphs
| that are intentionally bipartite. Or, if there is a
| staged layout like for pipeline architectures. Sometimes
| it's not obvious even when it seems it should be, like
| when graphviz really wants to make a certain edge really
| short. Be ready to backtrack sometimes. Then I usually
| remove the subgroup border after I'm done, but a few
| times they have been useful to leave there.
|
| One thing I really like about DOT is that adding
| hyperlinks to the vertices and edges that translate
| decently into the compiled output is really nice. I had
| an oncall dashboard that made liberal use of this feature
| that I still think back on fondly sometimes.
| whitten wrote:
| Can you point to where I can learn more about this ? E.G.
| an example and explanation exists for hyper link
| embedding ?
|
| I am especially interested in syntax suitable to be used
| in creating something to input into https://www.viz-
| js.com and creation of SVGs with embedded hyperlinks.
| Izkata wrote:
| If you're using subgraphs, with an edge across subgraph
| boundaries, it is slightly order dependent - a node will
| appear in the area it was first mentioned in. If you
| define all the nodes/subgraphs first and all the edges at
| the bottom you'd never notice this though.
| WorldMaker wrote:
| We've got a couple common graph literals today: Graphviz's
| "dot" language [0] and the Mermaid language (in part
| inspired from "dot") [1]. Mermaid is increasingly common in
| documentation today, given Github supports it almost
| everywhere Markdown is supported. Parsing dot or Mermaid as
| embedded literals in another language doesn't seem that
| complicated (Mermaid's parser is written in JS and already
| runs anywhere you can run a JS parser).
|
| Though one interesting thing to note here is that both
| languages are edge list languages and are optimized for
| algorithms that are useful for edge lists, particularly
| display/diagramming. That gets back to the article's point
| that there are other useful representations and those can
| matter for efficiency of algorithms.
|
| They also can matter for efficiency of a graph literal in a
| source document. Edge lists are great for sparse graphs,
| but if you have a dense graph it might be easier to write
| as a literal with a massive spreadsheet of an adjacency
| graph. Especially if it can just be a spreadsheet with
| modern affordances like scrolling and sticky rows/columns
| and such. There's no perfect answer for "every" graph.
|
| [0] https://graphviz.org/doc/info/lang.html
|
| [1] https://mermaid.js.org/intro/#flowchart
| gre wrote:
| array{ 1 2 3 } dict{ { "a" 1 } } graph{ { "a"
| "b" } { "b" "c" } { "c" "a" } } AVL{ { "a" 1 } { "b"
| 2 } }
|
| Factor has some syntax like this. All you have to do is
| pass the `{...}` to the prefix (array/dict/graph/AVL) and
| it knows how to construct one.
|
| https://github.com/factor/factor/blob/master/extra/trees/av
| l...
| abeppu wrote:
| And some languages (e.g. java, scala) have standard libraries
| with interfaces that describe ordered collections,
| dictionaries, etc, but offer multiple implementations so the
| programmer can pick based on their specific considerations, but
| still benefit from library code written around the interface.
| Hackbraten wrote:
| It also uses hint/marker interfaces (eg. `RandomAccess`) so
| the library code can choose among algorithms internally while
| still presenting an uniform API to the client.
| PaulHoule wrote:
| I've been involved with RDF and sometimes it seems the gap
| between "too simple to use RDF" and "too hard to use RDF" is
| tiny.
|
| For instance I tried to pitch a data processing library a bit
| like
|
| https://www.knime.com/
|
| but where RDF graphs (roughly like a JSON document) get passed
| over the "lines" but found that the heavy hitters in this space
| believed this sort of product has to use columnar execution to
| be "fast enough". You can certainly build something that can do
| operations on a dynamic RDF graph (really a set of triple) but
| in principle you could compile code that treats native data
| structures as if they were in RDF... You might get pretty good
| in speed but it won't be as fast as native and hard to make it
| easier to code for than native.
| jes5199 wrote:
| RDF is an XML standard, isn't it? there was a XML era for
| collections and dicts, too. That is now largely considered to
| have been a mistake
| zozbot234 wrote:
| RDF is not dependent on XML. There is a XML representation
| of RDF, but alternatives include JSON (using JSON-LD) and
| simple textual formats such as N3 and Turtle.
| PaulHoule wrote:
| One trouble w/ RDF is that there are two official ways to
| do ordered collections, the RDF list which is basically a
| LISP list, and then RDF collections where the order is
| encoded in predicates. Neither is well-supported in most
| tools, for instance SPARQL lacks the list handling
| capabilities that you'd see in
|
| https://docs.arangodb.com/3.12/aql/
|
| or
|
| https://www.couchbase.com/products/n1ql/
|
| The XMP spec, for instance, hacks Dublin Core by adding
| ordering information because... It matters what order the
| authors are in. Dublin Core on the other hand seems to be
| developed for cataloging elementary school libraries and
| they were huge fans of doing the easy stuff and leaving
| out anything moderately hard, so Dublin Core looks like
| it has a 1968 level of sophistication and MARC is so much
| more 2000s. People come to RDF, see all these problems
| that are ignored, and come to the conclusion RDF is not
| for them.
| hobofan wrote:
| There is a good repo that collects many of the issues
| with RDF[0], which I would have loved to have known about
| earlier in my journey with RDF.
|
| [0]: https://github.com/w3c/EasierRDF
| hobofan wrote:
| While there are (commonly used) alternative serialization
| formats, RDF is married to XML data model, as all core
| datatypes of RDF are defined in terms of XSD (XML Schema
| Definition) datatypes and very closely mimics it's data
| model.
|
| If you want to have an RDF that is independent of that
| (e.g. based on Apache Arrow so that it's compatible with
| modern big data tooling) you might as well start from
| scratch.
| obi1kenobi wrote:
| Two problems I see here, based on the research I've done in
| high-performance graph algorithms:
|
| - It's hard to find a "good-enough" graph implementation. The
| best hashtable is only a handful of percent better than the
| built-in ones. The best graph impl is 1000x or more better than
| any generic built-in one could be, so there's much more
| incentive to specialize (and people already specialize
| hashtables for just a handful of percent speedup!)
|
| - The baseline complexity level of implementing a reasonable
| hashtable is fairly high, even if for a small dataset. The
| baseline complexity of implementing a graph algorithm for a
| small dataset is pretty low, and the real problems come in
| later / at larger scale. So in graphs there's less incentive to
| learn a complex library's API when "I could just hack it
| myself," unlike for hashtables where the API is simple and
| doing it myself is much harder.
| bruce343434 wrote:
| More than just a handful of percent[1], but ok
|
| [1] https://probablydance.com/2017/02/26/i-wrote-the-fastest-
| has...
| obi1kenobi wrote:
| The work you cited is very impressive and very welcome.
|
| But you seem to be implying that `std::unordered_map` is
| the default choice one would use, which in my experience is
| not accurate -- it is well-known to have serious perf
| shortcomings, and everyone I know uses some other
| implementation by default. Even so, the delta from
| `std::unordered_map` to the improved hashtable in the blog
| post is impressive, and just shy of 10x.
|
| Graph algorithms frequently have 10x improvements from one
| state-of-the-art approach to the next -- for example,
| here's one from my own research[1]. The delta between
| state-of-the-art and "good default" in graph algorithms
| would often be around 100-1000x. And comparing state-of-
| the-art to the equivalent of an `std::unordered_map` would
| be another 10-100x on top of that, so 1000-100000x total.
|
| [1]: https://dl.acm.org/doi/10.1145/3210377.3210395
| brazzy wrote:
| > The baseline complexity level of implementing a reasonable
| hashtable is fairly high, even if for a small dataset.
|
| Have you tried doing it? My experience was that it was
| surprisingly simple. We may have different expectations for
| what is "reasonable", of course.
| DSMan195276 wrote:
| > The baseline complexity level of implementing a reasonable
| hashtable is fairly high, even if for a small dataset.
|
| I would disagree with this, it's actually really easy to make
| one if you're willing to do away with many features (which
| aren't essential, but provide performance benefits).
| Implementing one is just something you never have to do in
| most modern languages.
| 082349872349872 wrote:
| https://lobste.rs/s/uhmhum/hunt_for_missing_data_type#c_1na1...
|
| > _matrix multiplication and permanents are known to be non-
| cheap to compute, requiring worse-than-quadratic time! So any
| sort of good graph algorithm must dig deeper and be more
| specialized for the task at hand_
| boothby wrote:
| As somebody who's actually done the work, I don't think you've
| really spent time with the question. The problem is that
| graphs, in general, aren't special. They're a structural
| dumping ground.
|
| Data structures are specialized graph representations. The
| source code you write is in a specialized graph representation.
| For the overwhelming majority of programmers, the fact that
| something can be viewed as a graph is much like saying "oh well
| that file is just a really long sequence of bits, so it's
| effectively an integer." It's not wrong, but is it useful?
| pfdietz wrote:
| This is looking like a programming language problem. There is no
| way to make graph algorithms available in a way where the details
| are abstracted away.
|
| What would a programming language look like that could address
| all those issues?
| tlb wrote:
| Some C++ libraries do pretty well. In the Eigen math library
| you can call most functions on dense or sparse matrices, and
| some template magic makes it work efficiently for all
| combinations.
|
| It's a shame it's so hard to write that kind of generic
| template code.
| andsens wrote:
| _devils advocate_ : Is this maybe a case of discarding an 80%
| solution because you can't do the last 20%?
|
| I understand the constraints, but imagine how legible you could
| make code by replacing some key parts with a graph type that
| everybody knows. I honestly think that having a type that
| supports a small subset of possibilities and only has the
| simplest algorithms implemented would go a long way.
| boothby wrote:
| It isn't just an 80/20 problem. Imagine that you replace every
| linked list, binary tree, trie, etc., with a generic directed
| graph datatype. The resulting performance would be _abysmal_.
| The resulting notation would be horrid, too.
|
| Here's our nice linked list: def
| last_element(ll): last = ll while ll is not
| None: last = ll ll = ll.next
| return last
|
| And here's an implementation with generic graph notation:
| def last_element(g): for v, deg in g.out_degree:
| if deg == 0: return v return None
|
| There are several problems with this; most importantly, there
| can be silent failures when g is not a linked list. But it also
| throws out a useful abstraction where a list is equivalent to a
| node, so I wrote a horrid implementation that takes O(n)
| regardless of the position in the list. And then comes all the
| baggage of representation, because you can't just represent a
| node with a pointer anymore.
|
| When your data structure better reflects the, well, structure
| of your data, you can go faster and safer. There's a reason we
| teach undergrads about these specific datatypes and don't just
| sweep it all under a rug with "it's a graph!"
| bigbillheck wrote:
| I think it's more like discarding a 20% solution that can't do
| the last 80%.
| 4ad wrote:
| Ok, trees are not graphs but they are _very_ related, and
| algebraic data types are trees, so they are ubiquitous in
| functional programming.
|
| Why don't we have graphs in FP (or in Rust)? Because graphs
| require mutation (respectively break linearity).
|
| Why don't we have graphs in imperative languages? Perhaps because
| very few imperative languages have ADTs? Just a thought.
| boothby wrote:
| I disagree: trees are connected graphs with n nodes and n-1
| edges. They're graphs with a special structure that is highly
| exploitable for performance gains, but they're graphs.
| zokier wrote:
| In what sense graphs require mutation?
| joshlemer wrote:
| I think maybe they're talking about how, in order to create a
| data structure where two nodes contain a reference (edge) to
| each other, you have to resort to mutation. i.e. you have to
| create A, then B with reference to A, then mutate A to refer
| to B, or you have to create B, then A with reference to B,
| then mutate B to refer to A.
|
| Though this ignores that there are other ways to represent
| graphs, such as adjacency matrices, etc.
| harporoeder wrote:
| Even with direct mutual references between some node type this
| can be represented in a lazy functional language such as
| Haskell pretty easily without mutation.
| bigbillheck wrote:
| > Why don't we have graphs ... in Rust?
|
| You might have missed this from the article but:
| https://docs.rs/petgraph/latest/petgraph/index.html
|
| > Because graphs require mutation (respectively break
| linearity).
|
| I don't think this is actually the case. Graph nodes go in one
| container (`Vec` or `HashMap` or `BTreeMap`), and the edges go
| in another container (`HashMap` or `BTreeMap`). The object in
| which you store the node only needs to know what its name is,
| you can let something else know what its neighbors are.
| jerf wrote:
| This is a good article and I endorse it.
|
| I would supplement it with the observation that when I was a
| younger programmer, like many people, I considered "generic" or
| "flexible" a positive when describing a library or framework. I
| have come to see it as generally negative, especially when the
| developer's summary puts these adjectives or something similar
| front and center.
|
| Let me show you the most flexible possible Javascript framework.
| This will look like a joke, but it's not. It fits perfectly into
| an HN post. The most flexible possible JS framework is simply:
| eval
|
| Similarly flexible frameworks exist for dynamic scripting
| languages. For static languages one must invoke the entire
| compiler as the framework. Of course, if you think about it hard
| enough, I'm doing that for dynamic languages here too, it just
| has a snappier representation for dynamic languages.
|
| Frameworks and libraries provide their value precisely through
| limiting things, and then building on those limitations. Of
| course, the limitations must be well-chosen, to make what can be
| built on them interesting enough to pay for the limitations the
| framework chooses. But the essence of them are in their
| limitations. I start out from the get-go with the maximally
| flexible framework my language allows, which is itself the
| language, and the additional framework needs to make limitations
| on my code in order to do anything useful.
|
| (A problem when framework designers don't understand this is that
| they make a series of little incorrect design decisions that can
| often add up to a real pain. For instance, if I were to design a
| web framework that took over some amount of routing from the
| user, I would still leave you the ability to claim some bit of
| the URL space and route it entirely out of my framework, because
| I understand that my framework is based around limitations and
| you may need to expose a URL to something that can't work under
| those limitations. But someone who doesn't realize that
| frameworks intrinsically involve limitations might fail to give
| that callout because they can't imagine that someone might have a
| problem that their framework is not "flexible" and "generic"
| enough to handle. Imagine a CRUD framework, even a very good one,
| but I need to offer an endpoint based on streaming server events
| in the same URL space, which is intrinsically foreign to the CRUD
| framework's concept of page loads. This is just one example; real
| frameworks designed without this understanding will make dozens
| or hundreds of such little mistakes.)
|
| Graphs have the same problem. It seems like they're so flexible
| and generic that they ought to be more used and more useful. But
| that's precisely what kills them. Even if you nail down the
| problem to exactly one representation, they still don't fit. For
| instance I have a great need for data structures that don't admit
| cycles, but if all I have is a graph, imposing that limitation
| from a coding perspective is a real challenge. Mathematically
| it's trivial, I just say "and this graph has no cycles" _et
| voila_ [1], there are no cycles, but in code I need to enforce
| that somehow and there 's no trivial solution to that.
|
| Another way of viewing graphs is that we _do_ work in graphs all
| the time, precisely because everything in RAM can be seen as a
| graph. GC algorithms even generally work by viewing everything in
| very raw graphy terms. It just turns out the API you 'd expect to
| work over a graph just isn't useful in the general sense when
| applied to everything in a programming language's memory space.
| It seems like it ought to be, but it just isn't. It may seem like
| it would be great to have a set of employees and extract their
| names and then look that up into another database etc. etc. with
| a general graph query language or something, but it turns out the
| special considerations at each layer make it so that what the
| general purpose programming language is already doing is actually
| generally better. The details at each layer matter.
|
| I like the metaphor of architecture astronautics and have often
| discussed "the 30,000 foot view" versus the view on the ground
| here on HN, and to my mind the key to the metaphor isn't the
| nerdery of being an astronaut or the difficulty. The key is that
| when you get high up, everything looks the same. It feels like
| graphs ought to be awesome when you're looking down at the entire
| computing landscape from metaphorical low Earth orbit. But down
| in the trenches, the local concerns overwhelm that viewpoint...
| and this is real. This is not just because we all suck or we
| don't try hard enough or we just Don't Get It. It's real. The
| architecture astronaut is just wrong in this case. It's not even
| a beautiful vision this world isn't good enough to manifest or
| any such conciliatory thing... it's just wrong. It is good to
| write good code and reduce the amount of bespoke details to be
| considered. The programming community has made great progress
| there and there is still great opportunity to do more. But there
| are an awful, awful lot of details in the world, and the world
| being detailed is fundamental.
|
| [1]: Or if you are, like me, kinda a fan of surprise stringed
| instrument attacks, _et viola_.
| Terr_ wrote:
| > when I was a younger programmer, like many people, I
| considered "generic" or "flexible" a positive when describing a
| library or framework [...]
|
| I've come to prefer what I call "design for deletion": Most of
| those long-term "flexibility someday" needs are best-met by
| making sure the inflexible modules or flows can be clearly
| identified and ripped out for replacement. This leads to a
| certain kind of decoupling, although with a higher tolerance
| for coupling that can kept in check by static analysis.
|
| This is a contrast to my days of youthful exuberance where I
| thought I could solve the problem by making my work
| _extensible_ or _customizable_. No, I cannot make the immortal
| program, so I should focus on making a mortal one which can
| pass gracefully.
| montmorency88 wrote:
| I'd highly recommend Erwigs FGL library in Haskell as a really
| nice example of a generally performant graph data structure that
| is easy to work with. The API feels like working with lists
| because you are essentially consing contexts(node, neighbours)
| into a list of contexts that form your graph. Many standard graph
| algorithms are then built up from depth or breadth first search
| and you can compose really succinct programs to manipulate the
| graph. Graphs are labelled with any Haskell data structure and
| there is a graphviz library complementary to FGL to generate dot
| files which can be rendered according to the data carried in the
| node labels. Often in an application you want to both perform
| computations on your graph and render a visualization
| simultaneously to the end user or for debugging and in FGL you
| minimise duplication of application and rendering logic.
| tikhonj wrote:
| FGL is a great example of how to make a "nice" high-level graph
| interface suited for functional programming. I'm a big fan. But
| it's orders of magnitude too slow and memory-inefficient for
| performance-sensitive graph computations--if you have even
| moderately sized graphs and graph algorithms are a bottleneck,
| you'll need to use something else, and probably something
| domain-specific. Given the way the interface works, I don't
| think you _could_ have a high-performance version that would
| scale to large graphs.
|
| In my experience this leaves FGL in an awkward spot: on the one
| hand, it isn't sufficient for heavy-duty graph processing; on
| the other, if you _don 't_ need fancy high-performance graph
| algorithms, chance are that encoding your problem as a graph is
| going to be more awkward than defining some domain-specific
| types for what you're doing. Graphs are such a general
| structure that they're usually the wrong level of abstraction
| for higher-level domain-specific logic.
|
| Of course, sometimes you're writing graph code specifically and
| you need a nice way to express your graph algorithms without
| worrying about performance. In that case, FGL is great. I wrote
| a tutorial about using it to [generate mazes][1] and it helped
| me express the algorithms better than I would have been able to
| do without it. But that still leaves it as too narrow for
| something to be "the" graph representation in a language's
| standard library.
|
| [1]: https://jelv.is/blog/Generating-Mazes-with-Inductive-
| Graphs/
| montmorency88 wrote:
| Interesting. Under the hood FGL is mapping the graph to
| relatively efficient data structures like Patricia Trees as
| implemented in Data.IntMap so I would expect it to scale
| reasonably for inserting edges and mapping over nodes. I
| agree the memory inefficiency is definitely a limiting factor
| of the library. As you say I think it is best suited for
| expressing graph algorithms and if those calculations become
| the bottle neck a custom solution can be developed with the
| proof of concept already in place. Out of curiosity what
| number of nodes/edges would you consider a "moderately sized
| graph"? My user cases are typically on the order of 500-1000
| nodes with a similar number of edges that require bfs and
| topological sorting.
| tikhonj wrote:
| I don't have an exact number in mind--I imagine it's pretty
| context-specific--but 500-1000 nodes seems like it would
| qualify.
|
| I've played around with IntMap before and it's not a
| _great_ data structure. It 's a _binary_ Patricia trie,
| which means that you quickly get a relatively deep tree
| with lots of pointer traversals. Unless I 've managed to
| confuse myself on how it works, you'd end up with, what, at
| least 10 traversals to look up a value from 1000 keys?
| Chris_Newton wrote:
| _But it 's orders of magnitude too slow and memory-
| inefficient for performance-sensitive graph computations--if
| you have even moderately sized graphs and graph algorithms
| are a bottleneck, you'll need to use something else, and
| probably something domain-specific._
|
| This seems a little pessimistic to me. There are plenty of
| application domains that can be conveniently represented
| using graphs where you might have thousands of nodes and
| edges -- which is what I'd characterise as "moderately sized"
| -- and your needs might only extend to relatively simple and
| efficient graph algorithms. FGL is excellent in this kind of
| scenario.
|
| If you do need the kind of algorithms that explode in
| complexity then even a representation a couple of orders of
| magnitude more efficient won't help you much either. Big-O is
| the thing that is going to spoil your day in this story, not
| the constant factor. Some problems simply don't have
| convenient fast solutions and ideally with those you find a
| way to change the representation so the original problem
| doesn't arise in the first place.
|
| It's true that there's also a zone where you have
| significantly larger graphs but still only need
| computationally tractable algorithms, and in that case the
| overheads of a library like FGL become a factor in what is
| viable. I also don't disagree with you (and Hillel in the
| original piece) that it would be difficult to define
| comprehensive graph functionality to include in a standard
| library when there are so many different trade-offs involved.
|
| A good -- and not entirely unconnected -- analogy might be
| calculating with matrices. It's convenient to have support
| for simple but widely useful cases like 3x3 and 4x4 built
| into your language or standard library. However, once you're
| solving systems with hundreds or thousands of rows, you
| probably want more specialised tools like BLAS/LAPACK, and
| the structure of your matrix and how you can decompose it
| start to matter a lot more.
| michelpp wrote:
| > you probably want more specialised tools like BLAS/LAPACK
|
| The GraphBLAS and LAGraph are sparse matrix optimized
| libraries for this exact purpose:
|
| https://github.com/DrTimothyAldenDavis/GraphBLAS
|
| https://github.com/GraphBLAS/LAGraph/
| mizzlr_ wrote:
| graph is a data structure, not a data type. if you squint enough
| pointers are the building blocks (the data type is you please)
| for building graph data structure. a->b is pointer access, looks
| like an edge in the graph.
|
| graph data structure is parent of tree, code execution/ function
| call stacks work like a tree, think flame graphs.
|
| stacks and pointers are baked in assembly and cpu architecture.
| your claims can't be farther from the truth.
| renewiltord wrote:
| Pointer-based graph structure will make matrix algos painful to
| implement. A graph is a concept. Article is quite meaningful
| about how it follows the various subtypes. Would recommend
| reading.
| Chris_Newton wrote:
| _graph is a data structure, not a data type._
|
| It can refer to either.
|
| Any concrete data structure that uses indirection -- which
| means pretty much anything more complicated than dense arrays
| and records -- is indeed a form of graph.
|
| But graphs, and more constrained forms like DAGs and trees, can
| also be abstract data types, implemented by a variety of
| concrete representations.
|
| One of life's little ironies is that implementing a general
| abstract graph using a general concrete graph whose records and
| pointers correspond (roughly) 1:1 with the nodes and edges in
| the abstract graph is often a poor choice.
|
| Moreover, it's not unusual to have an abstract graph
| implemented using a non-graph data structure (for example, a
| dense adjacency matrix) or to use a graph-like data structure
| to implement an abstract data type whose interface doesn't look
| particularly graph-like (for example, a piece table).
| gue5t wrote:
| One interesting perspective is to view the sequence of lists ->
| trees -> DAGs -> general graphs as a loosening of restrictions:
|
| - list nodes may have one child
|
| - tree nodes may have multiple
|
| - DAG nodes may have multiple parents though restricted by
| topological ordering
|
| - graph nodes may have multiple parents from anywhere in the
| collection
|
| Lists and trees can be fully captured by sum and product types,
| but extending this representation style to DAGs and graphs
| doesn't work--you either get inefficiency (for DAGs) and then
| infinite regress (for cyclic graphs) attempting to continue the
| "syntactic" style of representation, or you need to adopt an
| "indirect" representation based on identifiers or indices or hash
| consing.
| nickpsecurity wrote:
| That's a neat idea. The other side of it might be
| expressiveness.
|
| The more expressive constructs are usually more productive and
| concise. The less expressive constructs are usually easier to
| optimize or analyze with a machine. The old rule, like in
| LANGSEC, is to pick the least-expressive option that works.
| Some people also develop transforming code (eg netaprogramming)
| to let you write highly-expressive code that generates correct,
| low-expressiveness code.
| schindlabua wrote:
| I like thinking of this concept via free constructions.
|
| As is somewhat commonly known the free Monoid is the List type;
| monoids are not commutative so we get a sense of "direction",
| like a list has a start and an end.
|
| If we add commutativity and look at free groups, we find they
| are equivalent to multisets.
|
| If we take associativity away from monoids and look at free
| semigroups, we get binary finger trees, I think?
|
| In some sense removing constraints from the binary operator
| results in more general free types. Would be interesting to
| find what free construction makes digraphs but I have to
| bounce.
| taeric wrote:
| A graph, as presented in this article, is a model of something
| else. That we have more than one way to implement this model is
| rather natural, all told. The hope that we can have a singular
| syntax and data structure that represents a graph in code is
| almost exactly why the author of Java's Linked List posted this
| gem: https://twitter.com/joshbloch/status/583813919019573248
|
| My favorite on the idea of having a linked list where the node is
| first class in your code, is almost precisely the problem. You
| rarely want/need to work at that level. In a very real sense,
| objects that have other objects are already trees of data. Many
| can back reference, such that then you have a graph.
|
| And then there is the joy of trying to use matrix operations to
| work with graphs. You can do some powerful things, but at that
| point, you almost certainly want the matrix to be the
| abstraction.
|
| Excited to see someone come up with good things in this idea. I
| retain very serious doubts that I want a singular model for my
| data.
| dustingetz wrote:
| Electric Clojure uses Clojure itself (s-expressions) as a graph
| authoring syntax, using a macro to reify dataflow through a
| reactive client/server system (here the use case is full stack
| user interfaces but the idea generalizes)
| https://github.com/hyperfiddle/electric (I'm the founder).
|
| IMO, the answer to the question "Where are all the graph types?"
| is: the graph authoring DSL needs to express scope, control flow
| and abstraction, which essentially makes it isomorphic to a
| programming language, freed of its evaluation model. In Python
| and Typescript, embedding a complete programming language is
| something that's rather hard to do!
|
| Also see my blog post "Four problems preventing visual flowchart
| programming from expressing web applications"
| https://www.dustingetz.com/#/page/four%20problems%20preventi...
| michelpp wrote:
| I think one of the elements that author is missing here is that
| graphs are sparse matrices, and thus can be expressed with Linear
| Algebra. They mention adjacency matrices, but not sparse
| adjacency matrices, or incidence matrices (which can express muti
| and hypergraphs).
|
| Linear Algebra is how almost all academic graph theory is
| expressed, and large chunks of machine learning and AI research
| are expressed in this language as well. There was recent thread
| here about PageRank and how it's really an eigenvector problem
| over a matrix, and the reality is, all graphs are matrices,
| they're typically sparse ones.
|
| One question you might ask is, why would I do this? Why not just
| write my graph algorithms as a function that traverses nodes and
| edges? And one of the big answers is, parallelism. How are you
| going to do it? Fork a thread at each edge? Use a thread pool?
| What if you want to do it on CUDA too? Now you have many
| problems. How do you know how to efficiently schedule work? By
| treating graph traversal as a matrix multiplication, you just say
| Ax = b, and let the library figure it out on the specific
| hardware you want to target.
|
| Here for example is a recent question on the NetworkX repo for
| how to find the boundary of a triangular mesh, it's one single
| line of GraphBLAS if you consider the graph as a matrix:
|
| https://github.com/networkx/networkx/discussions/7326
|
| This brings a very powerful language to the table, Linear
| Algebra. A language spoken by every scientist, engineer,
| mathematician and researcher on the planet. By treating graphs
| like matrices graph algorithms become expressible as mathematical
| formulas. For example, neural networks are graphs of adjacent
| layers, and the operation used to traverse from layer to layer is
| matrix multiplication. This generalizes to all matrices.
|
| There is a lot of very new and powerful research and development
| going on around sparse graphs with linear algebra in the
| GraphBLAS API standard, and it's best reference implementation,
| SuiteSparse:GraphBLAS:
|
| https://github.com/DrTimothyAldenDavis/GraphBLAS
|
| SuiteSparse provides a highly optimized, parallel and CPU/GPU
| supported sparse Matrix Multiplication. This is relevant because
| traversing graph edges IS matrix multiplication when you realize
| that graphs are matrices.
|
| Recently NetworkX has grown the ability to have different "graph
| engine" backends, and one of the first to be developed uses the
| python-graphblas library that binds to SuiteSparse. I'm not a
| directly contributor to that particular work but as I understand
| it there has been great results.
| jcgrillo wrote:
| I was really excited about RedisGraph, and sad to see it was
| cancelled. In my (limited) experience with graph databases they
| proved very frustrating because it seemed like they tried to do
| too much. Ultimately the way I thought of a graph was an
| indexing strategy into some underlying data. So I needed the
| graph to be very quick, but I didn't have any requirement to
| store actual data in it--just references. This made triples
| based graph storage seem very heavy-handed.
|
| The idea of composing sparse linear transformations to optimize
| queries is really cool. You can get a lot of work done in one
| shot that way, in a manner that's just quite a lot easier on
| the machine than chasing pointers around.
| adamnemecek wrote:
| > Mathematica, MATLAB, Maple, etc all have graph libraries of
| some form or another. I am not paying the thousands of dollars in
| licensing needed to learn more.
|
| Wolfram provides a free Mathematica called Wolfram Engine
| https://www.wolfram.com/engine/. It's Mathematica without the UI.
| I hear you can combine it with Jupyter Notebook to get a similar
| experience to Mathematica.
| dekhn wrote:
| Ha! I wrote the predecessor for the Python integration some ~25
| years ago
| https://library.wolfram.com/infocenter/MathSource/585/ and I
| think it uses a similar approach. The mathematica engine had a
| protocol for sending and receiving expressions and evaluating
| them. I was pretty proud of my implementation as it it was my
| first real attempt at doing anything remotely complicated with
| recursion.
| rhodin wrote:
| You can also use wolframcloud.com for free. Which runs the
| latest Mathematica online.
| hobofan wrote:
| I think the post mostly answers the questions "why are graph
| _algorithms_ not better supported in programming languages", with
| a focus that is much more on "big data" graph processing than
| graph support in general.
|
| I think if you look at graph support in general you are also
| looking at wider questions, like "why are OGMs (Object Graph
| Mappers) not as popular as ORMs" and "why is JSON so prevalent
| while RDF (or another low-level graph serialization) isn't"?
|
| And I think in the end it comes down to historic reasons (RDF
| emerged a bit too early and never evolved and accrued an
| ecosystem of horrible academic standards and implementations),
| and a graphs having just a smidge more of inherent complexity in
| implementation and learning curve that just doesn't scale well
| across many developers.
|
| ------
|
| I also wouldn't put too much weight on the "Graph Querying
| Language" part of the article. It sadly reads like exactly the
| marketing copy you would read from Neo4J or SPARQL enthusiasts
| that haven't tried building a product on top of it.
|
| > The main difference between all GQLs and SQL is that the
| "joins" (relationships) are first-class entities.
|
| Joins are first-class entities in SQL. They even have their own
| keyword (hint: it starts with J and ends with OIN) ;)
|
| If you also go to the lower levels of any graph query language
| and look at it's query plans you'll notice that there isn't any
| meaningful difference to that of one you'll find in an SQL based
| query. The standardization of GQL[0] as an SQL extension is
| evidence for that.
|
| > In SPARQL relationships are just edges, making the same query
| easy.
|
| SPARQL is easy if you need to do exact path traversals. If you
| try to do anything sophisticated with it (like you would in the
| backend of a webapp), you'll quickly run into footguns like joins
| with unbound values and you accidently join your whole result set
| away.
|
| [0]: https://en.wikipedia.org/wiki/Graph_Query_Language
| lmm wrote:
| > Joins are first-class entities in SQL. They even have their
| own keyword (hint: it starts with J and ends with OIN) ;)
|
| Having its own keyword is pretty strong evidence that something
| _isn 't_ first-class (e.g. typeclasses are not first-class in
| Haskell; control flow is not first-class in most programming
| languages).
| mcphage wrote:
| This is a good write up, but for me seems to miss the mark. I
| agree with the author that different algorithms require
| differently organized data structures. But what if your problem
| requires 2 different algorithms, which each work best with a
| different data structure? Either you pick one and run both (which
| is not ideal) or you run the first algorithm with one structure,
| convert it, and the run the second algorithm in the second
| structure.
|
| And once you can do that... why not have every algorithm ensure
| it runs in its best structure, and convert where necessary (and
| possible) on the way in? Yes, there's absolutely a performance or
| storage cost... but if the algorithm is that much faster, it
| should be worth it.
|
| Basically a beefier version of sorting your data before searching
| in it. If an algorithm works best with a specific model of a
| graph, then let that be part of the algorithm.
| riwsky wrote:
| > why not have every algorithm ensure it runs in its best
| structure, and convert where necessary (and possible) on the
| way in? Yes, there's absolutely a performance or storage
| cost... but if the algorithm is that much faster, it should be
| worth it.
|
| Because it's not that much faster, so it's not worth it. You're
| severely underestimating the amount of thought that went into
| the article, or the work of the experts interviewed.
| bevekspldnw wrote:
| Was seriously hoping at the start that article would reveal some
| secret easy solution to all my problems...only to learn there are
| none.
|
| :'(
| boothby wrote:
| I take solace in such findings. I learned that my quest was
| ill-conceived, and I abandoned it; I should be glad that I'm no
| longer searching for what cannot be found.
| bevekspldnw wrote:
| Yes that's true, the "smarter people than I failed at this"
| factor is definitely a form of solace.
| at_a_remove wrote:
| Hrm. This reminds me of a thought I had about, for the lack of a
| better term, "groups" in Python. Each data type has a property
| which is paramount and more or less defines the type.
|
| _Lists_ are ordered. The _tuple_ is immutable. The _dict_ is
| keyed. The _set_ is unique. (But there are some overlaps: dicts
| are both keyed and their keys are unique.)
|
| And I thought, what if you had a group where you could make it
| immutable or mutable, ordered or not-ordered, where the value was
| a key to something else (or not), and so on? But then I saw the
| weird edge cases and the explosions of complexity. Some
| combinations of these attributes are less appealing than others.
| JonChesterfield wrote:
| I needed a directed graph yesterday. I gave the nodes integer ids
| by appending them to an arena, then used a hashtable from integer
| to vector of integer. Iterating over it involves a set of
| integers to track which nodes have already been visited.
|
| Deduplicating nodes on insert into the area was more hassle than
| cobbling together the graph structure out of a hashtable and a
| vector.
|
| Maybe one reason against putting graphs in the standard library
| is they're easily put together from more common structures for
| whatever special case you have in mind.
| jkaptur wrote:
| I agree with this take - graphs are in an interesting
| superposition where implementing a large set of algorithms
| _correctly_ and in their full generality is hard (as the
| article points out), but getting started with an implementation
| that solves _a particular problem well enough for a particular
| case_ is easy and fun.
| PeeMcGee wrote:
| > Maybe one reason against putting graphs in the standard
| library is they're easily put together from more common
| structures for whatever special case you have in mind.
|
| This is a fair argument (how implementations tend to combine
| existing structures in bespoke ways). But any time I've needed
| to use a graph explicitly, it hasn't really mattered what
| underlying structures were involved. What has mattered each
| time is having to invent my own little API to expose well-
| known/primitive graph operations, then go and implement them
| which is unnecessarily error prone.
|
| Your example of de-duplicating nodes on insert sounds like it
| describes a property of your particular graph that may be
| better expressed through a type, which would also afford the
| necessary API. I'm approaching this from an OOP-ish perspective
| so do with that what you will.
|
| > I gave the nodes integer ids by appending them to an arena,
| then used a hashtable from integer to vector of integer.
| Iterating over it involves a set of integers to track which
| nodes have already been visited.
|
| This is what sucks about using graphs IMO. I don't want to
| think about all that stuff, I just want think about graphs. In
| practice I spend most of the time toiling around with noisy
| boilerplate that dominates my mental model and allows graph
| concerns to leak into business concerns.
| qazxcvbnm wrote:
| This reminds me of my quest to find the proper method to model
| what I've termed 'large types' in an ideal language.
|
| As is well known, algebraic data types as commonly found, consist
| of sums of products, yet a great deal of useful types are larger
| than that; some hopefully illustrative examples include:
|
| 1) the type of subsets of another type would be 2^X (hopefully
| demonstrating what I mean by 'large'ness);
|
| 2) in practical languages like TypeScript, the 'Partial' of a
| product type A x B x C would be (1 + A) x (1 + B) x (1 + C);
|
| 3) data structures in general, as a term amenable to some certain
| set of operations, when needing to be represented for performance
| reasons e.g. a) union-find structures (quotients?); b) a list of
| words and their inverted indexes for searching; c) a sorted list
|
| Reading more about type modelling, and learning of the
| disagreements in how even basic things like quotients ought to be
| represented as types, I've since resigned to an understanding of
| this as an unsolved problem, and relegated the modelling the
| kitchen sinks of types with _the_ kitchen sink of types - i.e.
| the function type (curbed with suitable type constraints upon the
| signature - from an index type to a suitable base type) - after
| all, its power and province being the irreducible kernel of type
| polymorphism, shadow over Church 's types, original sin against
| type decidability.
| andoando wrote:
| I've been thinking about something like this for the last 3
| years. However, I can't find a practical reason for it, even
| though I am sure there are.
| gue5t wrote:
| These types don't seem to escape the scope of what can be
| described with algebraic types, but the relationships between
| them seem like you're looking for a notion of type-level
| functions: subset [?] X => 2^X, partial [?] AxB =>
| (A+1)xpartial(B)
| qazxcvbnm wrote:
| Consider the case of Partials - we might like to restrict the
| Partials to different subsets of the fields for different
| purposes; consider the modelling of an inverted index.
|
| Certainly it is possible to represent each specific case as
| some algebraic type; but beyond trivial cases, I find that
| when I need such of these types, quickly I discover that
| there are myriad ways to express them, none of them uniquely
| natural, unlike the way a sum of products type (and its
| terms) can be pretty much unambiguously drawn from a
| specification.
|
| This matters especially when e.g. I need to evolve my types
| in a data migration.
| criloz2 wrote:
| Graph drawing tools are also very underwhelming, they work pretty
| good for small graphs until you have something like 500 nodes or
| more, then eventually their output becomes complete
| incompressible or very difficult to look at it, they miss the
| ability to automatically organize those graph in hierarchical
| structures and provide a nice interface to explore them, we are
| used that everything around us have some kind of hierarchy, I
| think that is the same kind of problem that will need to be
| solved in order to have a generic graph data type, also this kind
| of thing will need to be implemented at the compiler level where
| those graph generic algos will be adapted to the generated
| hierarchy of structures, and if you add a theorem prover that can
| check that certain subgraph will always have certain structures
| you can statically generated those procedures and for the other
| super graphs those methods will be generated dynamically at
| runtime.
|
| So whoever solve the problem for generic graph drawing will have
| the ability or the insight to implement this too.
| hobofan wrote:
| > we are used that everything around us have some kind of
| hierarchy
|
| I think the problem is more that we are used to the
| illusion/delusion that everything is hierarchical. The problem
| that we then encouter is with graph drawing is that it has to
| try and reconcile the fact that things in practice are rarely
| really hierarchical, and it's hard to draw those lines of where
| the hierarchies are with mathematical rigor. And that problem
| gets worse and worse the less properties you are allowed to
| assume about the underlying graph structure (connectedness,
| cyclic/acyclic, sparse/dense).
|
| In practice when you want build a UI that interacts with graphs
| it's often feasible to determine/impose one or two levels of
| meta-hierarchy with which you can do clustering (allows for
| reducing layout destroying impact of hairball nodes + improves
| rendering performance by reducing node count) and layout with
| fCOSE (Cytoscape.js has an implementation of that).
| nine_k wrote:
| Ideal things are often tree-like. Real-world structures are
| usually DAGs if they are nice and well-behaved.
|
| Making things planar, or almost planar with few crossings and
| nice clustering of related nodes, is usually hard past a couple
| dozen nodes :(
| swagasaurus-rex wrote:
| The illustrations I've seen of neural networks really
| highlights the sheet incomprehensibility of visualizing large
| graphs
| kjqgqkejbfefn wrote:
| https://www.microsoft.com/en-us/research/wp-
| content/uploads/...
| kjqgqkejbfefn wrote:
| >Graph drawing tools
|
| It's hard
|
| Graphviz-like generic graph-drawing library. More options, more
| control.
|
| https://eclipse.dev/elk/
|
| Experiments by the same team responsible for the development of
| ELK, at Kiel University
|
| https://github.com/kieler/KLighD
|
| Kieler project wiki
|
| https://rtsys.informatik.uni-kiel.de/confluence/display/KIEL...
|
| Constraint-based graph drawing libraries
|
| https://www.adaptagrams.org/
|
| JS implementation
|
| https://ialab.it.monash.edu/webcola/
|
| Some cool stuff:
|
| HOLA: Human-like Orthogonal Network Layout
|
| https://ialab.it.monash.edu/~dwyer/papers/hola2015.pdf
|
| Confluent Graphs demos: makes edges more readable.
|
| https://www.aviz.fr/~bbach/confluentgraphs/
|
| Stress-Minimizing Orthogonal Layout of Data Flow Diagrams with
| Ports
|
| https://arxiv.org/pdf/1408.4626.pdf
|
| Improved Optimal and Approximate Power Graph Compression for
| Clearer Visualisation of Dense Graphs
|
| https://arxiv.org/pdf/1311.6996v1.pdf
| rhelz wrote:
| Ya, the central obstacle is that:
|
| 1. for simple and small graph problems, a simple vector-of-
| vectors adjacency list is easy enough to code up.
|
| 2. For complex and huge graph problems, the only way to get
| performant solutions is to tailor the graph implementation to the
| specific details of the problem to be solved.
|
| And its hard to see what kind of language support would help,
| other than just having a super-smart compiler which could analyze
| the code and determine whether an adjacency list, matrix, 3d
| array, etc was the best way to implement it. That's the kind of
| optimization which we won't see in compilers for a while.
|
| It's another instance of the phenomenon which Strousroup noticed:
| we are really good at code sharing of small things like vectors,
| and of large things like operating systems. Its the middle-sized
| problems we are bad at.
| twoodfin wrote:
| _And its hard to see what kind of language support would help,
| other than just having a super-smart compiler which could
| analyze the code and determine whether an adjacency list,
| matrix, 3d array, etc was the best way to implement it. That 's
| the kind of optimization which we won't see in compilers for a
| while._
|
| I'm not so sure? Looking at an algorithm against an abstract
| graph type, then filling in the implementation to optimize for
| the particular algorithm seems right in the wheelhouse of code-
| specialized LLM's.
| rhelz wrote:
| Good point. The game has really changed in terms of what
| kinds of programs we can write now. Perhaps it's too
| pessimistic to not expect these sorts of optimizing compilers
| soon.
|
| Sounds like a good research opportunity to me.
| rocqua wrote:
| My experience with cipher is that the query optimizer doesn't
| know enough about the graph to pick up on trivial
| optimizations. This can't be fixed without a way to tell the
| optimizer about those properties, and even just dreiging a
| language to tell the optimizer those things is difficult.
|
| Just an LLM looking at your query isn't going to cut it. It
| will need to take your actual data into account.
| js8 wrote:
| > we are really good at code sharing of small things like
| vectors, and of large things like operating systems. Its the
| middle-sized problems we are bad at.
|
| Interesting. But I am not sure we are good at sharing small
| things - every programming language has its own implementation
| of vectors. Within one language ecosystem, the API of a vector
| is small, and that's probably what makes it easy to share.
|
| For operating systems, the API is relatively small compared to
| the internal complexity of the OS. This is also true for
| libraries for numerical problems, which are also easily shared.
| But the more you want to customize things (e.g. share a
| complicated data structure), this complicates the API and
| inhibits sharing.
|
| So it seems to this is determined by the surface area (relative
| size of the API) of the thing being shared.
| avgcorrection wrote:
| Maybe there could be some utility for a decently general Graph
| type that can be used for high-level testing and verification.
| Maybe you could implement your efficient graph per-problem and
| then validate it with a more high-level and declarative type. You
| would have to implement some isomorphism between the types
| though.
| graphviz wrote:
| Graphviz has its own foundation graph library, that's not used by
| any other project. It has its good and bad points.
|
| Based on that experience, we had our very own second-system-
| syndrome experience.
|
| We decided our graph library should be modular, type safe, and
| efficient. (These properties came up in the comments here, too.)
| This is probably just a variation of "good, fast, cheap - pick
| any two."
|
| By modular, want to write collections of graph algorithm
| libraries that are developed and even compiled independently.
|
| By type safe, we mean we want to detect programming errors during
| compilation, or link time at the latest. We don't want programs
| to throw runtime errors like "your node does not have a color
| attribute".
|
| By efficient, we mean that accessing an attribute of a graph is
| as cheap as a field in a C struct. (So, we don't want to carry
| around external hash table, or do a lot of string conversions,
| for instance.)
|
| You can argue about whether these things are worth the price or
| even make sense, but that's what we wanted. We had some famous
| C++ creators in our lab, so we figured we could get help, and we
| were willing to give C++ another chance.
|
| Gordon Woodhull, who had been an intern and kept working for us,
| is a brilliant programmer, and wrote an implementation of this
| kind of graph library working in templated C++. It's even
| published at https://www.dynagraph.org/ via sourceforge. The rest
| of us were not really sure we could ever understand how the code
| worked, so we had a code review with said famous C++ inventors.
| There were a lot of screens of code, and chinstroking, and
| greybeards pronounced "That would probably work." We knew we
| might have gone over the complexity cliff. (Let's not even talk
| about compile-time template errors, where one error fills an
| entire screen with details that only a... C++ inventor could
| love.) It was our fault, not anyone else's, and Gordon kept
| plugging away and even made all the dynamic graph layout stuff
| work, in Microsoft OLE, too. In hindsight it was probably our own
| Project Xanadu. While we got lost in this, a lot of things like
| Gephi (Java) and NetworkX and NetworKit (python) happened.
|
| Also, John Ellson, a very talented software engineer who had
| written parts of Graphviz, revitalized the main effort.
| transitionnel wrote:
| That _all_ sounds fantastic!
| bionhoward wrote:
| Could dataframes be a superior backend for graphs to maps/lists?
| skrebbel wrote:
| I think most languages support representing many kinds of graphs
| very well, particularly directed graphs without data on the
| edges: with objects, lists, and object references (or pointers).
|
| A tree is a graph. A typical Java-style object composing other
| objects composing other objects again, etc etc, often with cycles
| and parent backreferences and whatnot, is a graph. The html DOM
| is a graph.
|
| I recognize that these are often very tree-like, which feels like
| cheating in the same way as saying "well a list is also a graph!"
| is. But given that cycles are common enough that serializers (eg
| JSON.stringify) need to special-case those, I think maybe this is
| simply not true, and they're really just graphs. Very few tree-
| like class structures tend to remain pure trees.
|
| The only thing missing from references/pointers to be able to
| represent what the author is looking for, is having data on the
| edges. I think this is trivially solvable by putting nodes
| halfway the edge (= add a level of indirection, an operation so
| common that we don't even think of it as "adding data to the
| edges").
|
| So I think the answer is that there's no explicit data structure
| named "graph" because the basic building block of composition in
| nearly every language (reference/pointer) is an edge, and the
| basic building block of data representation
| (objects/structs/records) is a node. So for most graphs, trying
| to pour it all into some fancy Graph<V, E> datastructure feels
| like needless complexity.
| e_y_ wrote:
| Back in the C days, it was common to not use generic data
| structures like lists; instead you'd have a next_item pointer
| as a field in the struct. For linked lists, this would save you
| from needing another memory allocation or wrapper struct, and
| since C doesn't have templates you'd either have to blindly
| cast the type or use macros or write a type-specific iterator
| anyways.
|
| Lists eventually became a standard language feature in C++ and
| other languages, but it's trickier for trees and graphs. Taking
| the DOM example, you might be searching through child elements
| (div, span, etc) or nodes (text nodes, comment nodes) and
| different operations might only work with a specific subset of
| the "edges". There might be pointers to other objects, like
| from a DOM node to accessibility tree node. You might even have
| multiple parent node pointers, such as a pointer that takes you
| to the nearest shadow root or something.
|
| Since there are multiple ways to traverse the same data
| structure, generic functions don't work on it. You could create
| a separate tree/graph for each thing you want to use it for,
| but that takes additional memory and has to be updated when the
| original struct changes. Or you could create some kind of
| adapter that has a get_edges() function, but this might not be
| very well optimized or might be clunky for many other reasons.
| So it usually just ends up being simpler rolling your own
| functions instead of using a library.
| actionfromafar wrote:
| Bonus points for not allocating space for the next pointer if
| you didn't plan to use it...
| ogogmad wrote:
| I suspect a lot of graph theory can be reduced to abstract
| algebra. You consider matrices over lots of different scalar
| types. The scalar types should be:
|
| - Rigs (rings without negation),
|
| - idempotent (that is, where x + x = x for all x),
|
| - equipped with involution (so that undirected graphs can be made
| the default by restricting the matrices which represent graphs to
| only self-adjoint matrices),
|
| - and the entries of the matrix can be restricted to a *-ideal.
| Note that a *-ideal can be considered a scalar type in its own
| right.
|
| Different choices of the above scalar types can be used to
| capture different graph types: Weighted, directed, undirected,
| bipartite.
|
| There's no clue to physical implementation, other than that
| sparse graphs can be treated like sparse matrices. Anybody tried
| this? How did it work out?
| bigbillheck wrote:
| You're a couple decades too late: https://graphblas.org
| ogogmad wrote:
| If you read what I wrote, you'd see I never claimed
| originality. Anyway, that's cool. I'll need to check how much
| abstract algebra stuff they use.
|
| Using semirings (uh, rigs) alone isn't impressive. Do they
| consider semirings with more algebraic structure attached to
| them?
| bigbillheck wrote:
| > Using semirings (uh, rigs) alone isn't impressive. Do
| they consider semirings with more algebraic structure
| attached to them?
|
| Wiki says they have R, the two tropical semirings, the
| 'max-min' semiring, and GF(2). The tropical and max-min
| have your idempotency requirement, all but the max-min have
| involution.
| lowbloodsugar wrote:
| Couple of thoughts:
|
| First, the discussion about representation highlights that the
| issue is a lack of infinite resources. If we had an infinite
| computer, that could execute an infinite number of operations in
| zero time, and had infinite memory, then we wouldn't be worrying
| about whether it's better to store the graph as a matrix, an edge
| list, or a pointer graph.
|
| Software Engineering is everywhere and always a job of
| optimization. Sometimes that optimization is premature, and
| sometimes it's too little too late. It's always about
| optimization.
|
| Second, when we're talking about a graph of 10 nodes, it really
| doesn't matter what data structure we use. It can quickly matter
| if we have 100s or 1000s of nodes and edges because now the
| possible arrangements are huge as is the search space. But this
| is no different than other problems like the knapsack problem
| where the search space is huge: depending on the problem, there
| is very likely a "trick" to make it tractable, and that trick is
| different depending on the problem.
|
| So, like the knapsack problem, there are different, specific
| solutions for specific graph problems.
| csb6 wrote:
| This is an interesting article but one major effort it doesn't
| mention is the Boost Graph Library [0].
|
| It is kind of clunky in places because it is written in C++03 and
| uses some weird idioms to simulate keyword arguments and provide
| generic ways of getting attributes for nodes. Also it suffers
| from the terrible template instantiation errors that most C++
| template libraries do. But I still think it addresses a lot of
| the difficulties covered in the article:
|
| > There are too many design choices
|
| BGL is limited to directed/undirected multigraphs, so hypergraphs
| are not supported. However I think these cover most use cases.
|
| In terms of implementation choices, BGL provides several concrete
| data types, such as an adjacency list and an adjacency matrix. It
| also provides adaptors for the GraphBase and LEDA graph
| libraries. If none of these are suitable you can write adaptor
| functions to support your custom data type. All algorithms* work
| unmodified on these concrete implementations.
|
| > So which algorithms should come with the library?
|
| BGL comes with most of the common ones [1], but I do wish it came
| with more. The implementations of them are quite hard to read
| because they are written in highly generic (and before many of
| the conveniences offered in C++11) C++ code.
|
| > Performance is too important
|
| Since BGL is generic using C++ templates instead of runtime
| polymorphism, it should (in theory) be able to work with a
| concrete graph implementation that is performant for a certain
| task and so let you reuse its generic algorithms.
|
| I think the article describes a lot of the difficulties that
| Stepanov's generic programming approach tries to solve (e.g.
| finding the most abstract but still efficient implementation of
| an algorithm, writing algorithms that depend on a limited set of
| type requirements, having many data types that can reuse the same
| algorithms). While C++ supports this style of programming it is
| not ideal for it, but I think BGL is the closest thing I have
| seen to a generic graph library that is also performant in many
| cases.
|
| *Algorithms have varying requirements, e.g. some may need to be
| able to remove edges while others do not. But these requirements
| are generic and can be fulfilled by many different graph
| implementations.
|
| [0]
| https://www.boost.org/doc/libs/1_84_0/libs/graph/doc/index.h...
|
| [1] section 22,
| https://www.boost.org/doc/libs/1_84_0/libs/graph/doc/table_o...
| csb6 wrote:
| As a sidenote, Stepanov's introduction [0] to the Boost Graph
| Library book explains a lot about his approach to programming.
| He didn't write the BGL, but his work on the STL was a major
| inspiration for it.
|
| [0] http://stepanovpapers.com/siekforeword.html
| rdtsc wrote:
| > There's a gap between how often software engineers could use
| graphs and how little our programming ecosystems support them.
| Where are all the graph types?
|
| They've been there for quite a while :-)
| https://www.erlang.org/doc/man/digraph.html
| https://www.erlang.org/doc/man/digraph_utils
|
| And if you want to do some set theoretical stuff you're covered
| as well: https://www.erlang.org/doc/man/sofs.html
| Borg3 wrote:
| Ahh, Graphs.. One of my favorite subject. I love doing graphs but
| im kinda bad at implementation. But, I still managed to slap D3 +
| Cola to make my own little interactive visualizer I use for
| various networking visualizations (L1,L2,L3).
|
| Here is example of IRCnet network:
|
| http://ds-1.ovh.uu3.net/~borg/d3/ircnet.htm
| jkaptur wrote:
| I've often wondered about a missing application: "Excel for
| graphs".
|
| Just like Excel for tabular data, it would support RAM-sized data
| (enough to require a computer, but not so much that you need a
| data center), implement lots of algorithms and visualizations
| "well enough", and require no programming skill to operate.
|
| As the article says, a lot of our real-world problems are _graph
| problems_ - why are programmers the only ones who should have the
| tools to solve them?
| empiko wrote:
| Yeah, I feel like the article is too quick with its
| conclusions. Many other problems can be made arbitrary complex
| and difficult with additional requirements. But there are still
| data structure and standard libraries to provide good enough
| experience that fits most use-cases. And if you need something
| extra spicy you need to build a custom solution.
|
| The article claims that graphs are often just too big, but
| yeah, if you ask people who are actively working on graph
| algorithms they might have that sort of experience. But most
| programmers and users probably only work with really small
| graphs.
| caditinpiscinam wrote:
| > Relational databases are graphs where the nodes are records and
| the edges are foreign keys
|
| I disagree with the premise of the article: programming languages
| _do_ have strong and mature support for graphs in the form of
| relational database interfaces, which cover most of the real-
| world use-cases for linked data.
| tunesmith wrote:
| Given the wide variety of implementations, what I'd like is a
| questionnaire that asks you what sort of graph you are intending
| to work with, and then recommends what sort of algorithm
| implementations or software packages would be best... I'm hoping
| that the language models will get better at this sort of question
| over time.
| ylow wrote:
| I think this is because a graph is not a data-structure nor a
| data-type. It is really an abstraction.
|
| Fundamentally, _all_ I need to define a graph is a set of
| vertices v \in V and function Neighbors(v). And that really is
| all is needed for the most foundational set of graph algorithms.
|
| Everything else are case-by-case constraints. Does A->B imply
| B->A? is the node set partitionable with certain constraints? Are
| there colors? labels?
|
| To make things _even_ more general I can go up one level and
| consider the hypergraph. In which case I just have a set of
| vertices, and a set of sets of vertices. This can be represented
| in a myriad of different ways depending on what you are
| interested in. Of which (non-hyper) graph is simply a special
| case.
|
| An alternative way to think about it perhaps from the database
| perspective, is that its a query optimization and indexing
| problem. Depending on what questions you want to ask about the
| graph, there will be different ways to represent the graph to
| answer the question better. Just like there is not one way to
| represent the abstraction called "Table", there is not one way to
| do "Graph" either. It really depends on the questions you care
| about.
| gloftus wrote:
| Yes, graphs are ubiquitous because they are so abstract. They
| live on the same level of abstraction as pure numbers. There
| are useful "numerical" libraries that exist, and by analogy I
| think you could say there are also useful "graphical" libraries
| that exist. But we don't really have "number" libraries, and we
| don't really have "graph" libraries, because those concepts are
| a bit too abstract to write APIs against.
| kragen wrote:
| it's true that numbers are very abstract, which is what makes
| it so easy to design apis for them
|
| the python runtime includes four built-in number types (small
| integer, arbitrary-precision integer, float, and complex) and
| the python standard library includes two more number types
| (decimal and fractions), and one of the most popular non-
| standard libraries for python is numpy, which provides some
| other kinds of numbers such as single-precision floats,
| vectors, and matrices. other systems like pari/gp have number
| libraries that provide other kinds of numbers, such as p-adic
| numbers and galois field elements
|
| the only programming languages i've ever used that didn't
| have 'number' libraries were esoteric languages like
| brainfuck and the lambda calculus
| fatherzine wrote:
| What an awesome article. Kudos to the author!
|
| On the core observation "there are too many implementation
| choices", that is not quite right. True, the author mentions 4,
| and there are further variations. In practice, a library can:
|
| 1. Implement _all_ suitable graph representations.
|
| 2. Implement algorithms tailored to the representation(s) that
| offer the highest performance.
|
| 3. Provide _transformations_ from one representation to another.
| This is O(#representations), trivial to implement and trivial to
| use. Quite fair workload for both maintainers and users.
|
| 4. Bonus, provide import / export _transformations_ from / to
| common standard library datatypes and idioms.
|
| Memory and transformations are cheap, 99% of use-cases would
| likely find the overhead of transforming data, both in RAM and
| CPU, negligible.
|
| Edit: "the harsh truth of working at Google is that in the end
| you are moving protobufs from one place to another." --
| https://news.ycombinator.com/item?id=20132880
| josephg wrote:
| Sounds like the makings of a huge library that I'm not sure I'd
| even use in my work. I use graphs _heavily_ in my work, and my
| experience matches the people the author interviewed.
|
| We always end up reimplementing graphs because:
|
| - Performance matters, and no off the shelf graph library I've
| seen can take advantage of many of the regularities in our
| particular data set. (We have an append-only DAG which we can
| internally run-length encode because almost all nodes just have
| an edge pointing to the last added item).
|
| - I haven't seen any generic graph library which supports the
| specific queries I need to make on my graphs. The big one is a
| subgraph diffing function.
|
| - Writing something custom just isn't much work anyway! Graphs
| are way simpler to reimplement than btrees. You can have a
| simple graph implementation in tens of lines. Our highly
| optimised library - with all the supporting algorithms - is
| still only a few hundred lines of code.
|
| I think it would be handy to have ways to export the data into
| some standard format. But eh. I think pulling a library in for
| our use case would add more problems than it would solve.
| fiddlerwoaroof wrote:
| FWIW, despite its name being an abbreviation for LISt Processing,
| the fundamental datatype of lisp (the cons cell) is actually a
| vertice on a directed graph with the restriction that each
| vertice can only have two outgoing edges.
| AtlasBarfed wrote:
| My own take: a document database, which devs seem to love,
| combined with relations aka edges,is basically a property graph
| and that's what basically you need. Directionality is a property
| on the edge.
| ChicagoDave wrote:
| I've been building an interactive fiction platform built in C#
| with a world model contained within a bidirectional graph data
| structure.
|
| Since IF stories are relatively small in graph terms, it's a
| reasonable solution.
|
| Locations and objects are nodes and movement and location of
| objects are edges. Nodes and edges both can have dynamic
| properties.
|
| I've also noticed the lack of graph structures in programming
| languages, so this article was very enlightening.
| zacify wrote:
| I feel like this is common knowledge....
| js8 wrote:
| Another data type that would be quite useful is a table (like in
| a database). For the same reasons, too many design choices.
|
| Anyway, that being said, I have felt that progress will be made
| in programming languages if the compiler gets to choose an
| implementation of a data structure, kinda like when a database
| chooses an execution plan. So you just use an abstract structure
| (like sequence, map, set, table, graph) and based on the program
| profile, the compiler will pick the specific implementation. It
| will also transform the structure into another isomorphic one as
| needed. (Some programming languages already do something like
| this, for example, array of structs to struct of arrays
| conversion.)
| khanguy wrote:
| I'm surprised I haven't seen anyone mention Stanford's SNAP [0]
| yet. In addition to their C++ library, they also have a Python
| interface too.
|
| [0] https://snap.stanford.edu/
___________________________________________________________________
(page generated 2024-03-04 23:00 UTC)