[HN Gopher] A Gentle Introduction to Graph Neural Networks
       ___________________________________________________________________
        
       A Gentle Introduction to Graph Neural Networks
        
       Author : misonic
       Score  : 290 points
       Date   : 2024-12-20 04:10 UTC (18 hours ago)
        
 (HTM) web link (distill.pub)
 (TXT) w3m dump (distill.pub)
        
       | samsartor wrote:
       | GNNs have been a bit of a disappointment to me. I've tried to
       | apply them a couple times to my research but it has never worked
       | out.
       | 
       | For a long time GNNs were pitched as a generalization of CNNs.
       | But CNNs are more powerful because the "adjacency weights" (so to
       | speak) are more meaningful: they learn relative positional
       | relationships. GNNs usually resort to pooling, like described
       | here. And you can output an image with a CNN. Good luck getting a
       | GNN to output a graph. Topology still has to be decided up front,
       | sometimes even during training. And the nail in the coffin is
       | performance. It is incredible how slow GNNs are compared to CNNs.
       | 
       | These days I feel like attention has kinda eclipsed GNNs for a
       | lot of those reasons. You can make GNNs that use attention
       | instead of pooling, but there isn't much point. The graph is
       | usually only traversed in order to create the mask matrix (ie
       | attend between nth neighbors) and otherwise you are using a
       | regular old transformer. Often you don't even need the graph
       | adjacencies because some kind of distance metric is already
       | available.
       | 
       | I'm sure GNNs are extremely useful to someone somewhere but my
       | experience has been a hammer looking for a nail.
        
         | stephantul wrote:
         | Same! I've seen many proposals to use a GNN for some problem
         | for which we used a "flat" model, e.g., taking into account
         | HTML structure when predicting labels for pages. Even when it
         | seemingly made a lot of sense to use them, it didn't work.
        
         | lmeyerov wrote:
         | Are you doing some regular like vision?
         | 
         | For the reasons you're saying, I don't think it's an accident
         | that GNNs are popular mostly in domains like recommendations
         | that feel graph-y for their domain model so getting to a useful
         | topology isn't as big a leap.
         | 
         | A frustration for me has been more that many of these graph-y
         | domains are about behavioral machine/people data like logs that
         | contain a large amount of categorical dimensions. The graph
         | part can help, but it is just as import to key on the
         | categorical dimensions, and doing well there often end up
         | outside of the model - random forest etc. It's easier to start
         | with those, and then is a lot of work for the GNN part (though
         | we & others have been trying to simplify) for "a bit more
         | lift".
         | 
         | Of course, if this is your core business and this means many
         | millions of dollars, it can be justified... but still, it's
         | hard for most production teams. In practice, we often just do
         | something with pygraphistry users like xgboost + umap and move
         | on. Even getting an RGCN to perform well takes work..
        
         | igorkraw wrote:
         | GNNs are useful at least in one case, when your data a set of
         | atoms that define your datum through their interactions,
         | specifically a set that is that is high cardinality (so you
         | can't YOLO it with attention) with some notion of neighbourhood
         | (i.e. geometry) within your set (defined by the interactions)
         | which if maintained makes the data permutation equivariant, BUT
         | you can't find a meaningful way to represent that geometry
         | implicitly (for example because it changes between samples) =>
         | you YOLO it by just passing the neighourhood/interaction
         | structure in as an input.
         | 
         | In almost every other case, you can exploit additional
         | structure to be more efficient (can you define an order?
         | sequence model. is it euclidean/riemanian? CNN or manifold
         | aware models. no need to have global state? pointcloud
         | networks. you have an explicit hierarchy? Unet version of your
         | underlying modality. etc)
         | 
         | The reason why I find GNNs cool is that 1) they encode the very
         | notion of _relations_ and 2) they have a very nice relationship
         | to completely general discretized differential equations, which
         | as a complex systems/dynamical systems guy is cool (but if you
         | can specialize, there's again easier ways)
        
         | energy123 wrote:
         | Google's GraphCast is a GNN:
         | https://deepmind.google/discover/blog/graphcast-ai-model-for...
        
           | eperfa wrote:
           | Google DeepMind's GenCast is based on diffusion:
           | https://deepmind.google/discover/blog/gencast-predicts-
           | weath...
           | 
           | (Partially) Google Research's/DeepMind's NeuralGCM is based
           | on hybrid models using ODEs and learnt physics:
           | https://www.nature.com/articles/s41586-024-07744-y
           | 
           | Microsoft Research's Aurora on vision transformers:
           | https://www.microsoft.com/en-us/research/blog/introducing-
           | au...
           | 
           | Huawei's Pangu Weather is also a 3D transformer I believe
           | https://www.nature.com/articles/s41586-024-07744-y
           | 
           | I just wanted to highlight that there are multiple approaches
           | in use for the same problem / in the same domain, and GNN
           | does not seem to be the most widely used one.
        
         | biomcgary wrote:
         | I've followed GNNs in biology and applied them in a couple
         | domains, but have been disappointed by the results so far. I'm
         | a bit surprised because I have successfully used other graph-
         | based approaches in biology.
        
       | helltone wrote:
       | It seems GNNs operate on a fixed topology. What if I want to
       | approximate some transformation of the topology of the graph? For
       | example learning how to layout a graph, or converting program
       | abstract syntax trees to data flow graphs.
        
         | igorkraw wrote:
         | The whole point of GNNs is that they generalize to arbitrary
         | topologies by explicitly conditioning the idea of "neighbours"
         | on the graph specifying the topology. Graph layout has been
         | tried here https://github.com/limbo018/DREAMPlace to great
         | fanfare although there is recent drama about it
         | https://www.semanticscholar.org/paper/The-False-Dawn%3A-Reev...
         | . Graph transformations are a thing as well
         | https://arxiv.org/abs/2012.01470 but it's a tricky problem
         | because you implicitly need to solve the graph matching problem
        
           | Xmd5a wrote:
           | Graph layout is extremely interesting for infographics, since
           | a handmade graph will almost always beat what tools such as
           | graphviz can come up with (and I'm not even mentioning
           | algorithms that go beyond Sugiyama's for which there is only
           | a paper).
           | 
           | Any progress on this front ?
        
         | FjordWarden wrote:
         | Maybe homology can help, it is a sort of calculus for discrete
         | structures where you count how many N dimensional hole there
         | are over time. Dunno about NN but that is what they can do with
         | fMRI.
        
       | leah_sun wrote:
       | good share
        
       | cherryteastain wrote:
       | There are a lot of papers using GNNs for physics simulations
       | (e.g. computational fluid dynamics) because the unstructured
       | meshes used to discretize the problem domain for such
       | applications map very neatly to a graph structure.
       | 
       | In practice, every such mesh/graph is used once to solve a
       | particular problem. Hence it makes little sense to train a GNN
       | for a specific graph. However, that's exactly what most papers
       | did because no one found a way to make a GNN that can adjust well
       | to a different mesh/graph and different simulation parameters. I
       | wonder if there's a breakthrough waiting just around the corner
       | to make such a generalization possible.
        
         | magicalhippo wrote:
         | Naive question:
         | 
         | Words in sentences kinda forms graphs, referencing other words
         | or are leafs being referenced, both inside sentences and
         | between sentences.
         | 
         | Given the success of the attention mechanism in modern LLMs,
         | how well would they do if you trained a LLM to process an
         | actual graph?
         | 
         | I guess you'd need some alternate tokenizer for optimal
         | performance.
        
           | cherryteastain wrote:
           | For physics sims, I'd say it's useless.
           | 
           | Imagine you discretize a cube into 1000 gridpoints in each
           | direction, that's 1000^3 = 1 billion nodes/"tokens". Plus you
           | typically time-march some sort of equation so you need the
           | solutions previous 3-5 timesteps as well so that's 3-5
           | billion tokens. If you are gonna do that in the first place,
           | you may as well just use the traditional solver. Traditional
           | solvers usually set up and solve a matrix equation like Ax=b
           | with an iterative method like multigrid which is O(n) as
           | opposed to transformer's O(n^2). It'll give you a much more
           | accurate answer much quicker than it'll take a transformer to
           | do attention on a sequence of length 3 billion.
           | 
           | The entire point of using GNNs/CNNs in this field is that
           | people rely on their ability to make inference using local
           | information. That means the value at each gridpoint/node can
           | be inferred from neighbouring nodes only, which is O(n) like
           | multigrid. Idea in most papers is that the GNN can do this
           | faster than multigrid. Results so far are mixed, however [1].
           | 
           | [1] https://arxiv.org/abs/2407.07218
        
             | magicalhippo wrote:
             | Ah yes, for dense problems like that I wouldn't expect it
             | to work well. The example graphs in the submission were
             | mostly quite sparse, hence why I thought of LLMs. But
             | perhaps that was just for illustrative purposes.
        
           | disattention wrote:
           | This is actually a good insight. It turns out that
           | transformers are indeed a form of graph network, precisely
           | because of the attention mechanism. Graph attention networks
           | are actually a very popular GNN architecture. Generally, the
           | issue with using an LLM style architecture for generic graphs
           | is modeling the sparsity, but is possible by using the graph
           | adjacency matrix to mask the attention matrix. There are a
           | number of papers and articles which address this connection,
           | and plenty of research into mechanisms for sparsifying
           | attention in transformers.
           | 
           | There are also graph tokenizers for using more standard
           | transformers on graphs for doing things like classification,
           | generation, and community detection.
        
             | algo_trader wrote:
             | Any canonical papers on GNN for code graphs?
        
           | whimsicalism wrote:
           | yep you're now pretty much at state-of-the-art
        
           | 6gvONxR4sf7o wrote:
           | That's the kind of thing that I could imagine could
           | dramatically speed up certain tasks, but not enable
           | particularly new abilities. A ton of the challenge is
           | converting a sequence into a graph. So if you need a huge
           | clever model to turn a sequence into a graph, then your
           | potential gain downstream is either a) in making certain
           | computationally hard queries easier, or b) answering tons and
           | tons of followup queries dramatically faster (enough to make
           | the initial graphification overhead okay).
           | 
           | For (a), any imperfections in the graphification make the
           | problem super hard and researchy.
        
         | zmgsabst wrote:
         | A general graph solver has to be a general intelligence, since
         | it would be able to successfully model category theory.
        
       | openrisk wrote:
       | Very high quality work, its a pity that distill.pub did not find
       | a sustainable way forward [1].
       | 
       | On GNN's, the lack of datasets [2] might be a reason they are not
       | as talked about. This is something that has affected also the
       | semantic web domain.
       | 
       | [1] https://distill.pub/2021/distill-hiatus/
       | 
       | [2]
       | https://huggingface.co/datasets?task_categories=task_categor...
        
         | siver_john wrote:
         | Honestly, seeing this on the front page had my hopes up they
         | had found out a way forward but alas.
        
         | wenc wrote:
         | My personal hack is that before jumping into papers in
         | unfamiliar areas like these, I go on YouTube and search for
         | "<topic> explained".
         | 
         | The power of YouTube is that for any hot area, there are a
         | bunch of people incentivized to make a short video that
         | maximally engages. The quality can be quite high even at higher
         | levels of math abstractions. The visuals are really helpful to
         | get a feel for the abstractions (3 blue 1 brown proved this
         | years ago).
         | 
         | There are some excellent videos of GNNS that in less than 10
         | mins gave me a launching point into the literature.
        
       | esafak wrote:
       | (2021)
        
       | memhole wrote:
       | Would love to see distill come back
        
       | ziofill wrote:
       | It's very sad distill.pub doesn't accept new submissions... :/
        
       | eachro wrote:
       | Is there consensus about whether gnn architectures are better
       | than transformer based ones at this point? I am aware that
       | transformers can be viewed as a gnn too.
        
         | Legend2440 wrote:
         | GNNs let you build some of the structure of your problem into
         | the network architecture. This can be useful if your problem
         | has a well-understood structure, like physical simulations.
         | 
         | However in many cases we do not know the structure of our
         | problem (that's why we want to use ML in the first place) and
         | in these cases GNNs do not beat transformers.
        
       | hi_hi wrote:
       | I feel very dumb. There is an example on that page with 4 nodes
       | (a,b,c,d) and it shows a total of 24 possible combinations.
       | 
       | What is the generalised formula for calculating this, given the
       | number of nodes but also edges need to be considered.
       | 
       | It doesn't appear to be explained in the article. I think it may
       | be a factorial?
        
       ___________________________________________________________________
       (page generated 2024-12-20 23:00 UTC)