[HN Gopher] A Gentle Introduction to Graph Neural Networks
___________________________________________________________________
A Gentle Introduction to Graph Neural Networks
Author : misonic
Score : 290 points
Date : 2024-12-20 04:10 UTC (18 hours ago)
(HTM) web link (distill.pub)
(TXT) w3m dump (distill.pub)
| samsartor wrote:
| GNNs have been a bit of a disappointment to me. I've tried to
| apply them a couple times to my research but it has never worked
| out.
|
| For a long time GNNs were pitched as a generalization of CNNs.
| But CNNs are more powerful because the "adjacency weights" (so to
| speak) are more meaningful: they learn relative positional
| relationships. GNNs usually resort to pooling, like described
| here. And you can output an image with a CNN. Good luck getting a
| GNN to output a graph. Topology still has to be decided up front,
| sometimes even during training. And the nail in the coffin is
| performance. It is incredible how slow GNNs are compared to CNNs.
|
| These days I feel like attention has kinda eclipsed GNNs for a
| lot of those reasons. You can make GNNs that use attention
| instead of pooling, but there isn't much point. The graph is
| usually only traversed in order to create the mask matrix (ie
| attend between nth neighbors) and otherwise you are using a
| regular old transformer. Often you don't even need the graph
| adjacencies because some kind of distance metric is already
| available.
|
| I'm sure GNNs are extremely useful to someone somewhere but my
| experience has been a hammer looking for a nail.
| stephantul wrote:
| Same! I've seen many proposals to use a GNN for some problem
| for which we used a "flat" model, e.g., taking into account
| HTML structure when predicting labels for pages. Even when it
| seemingly made a lot of sense to use them, it didn't work.
| lmeyerov wrote:
| Are you doing some regular like vision?
|
| For the reasons you're saying, I don't think it's an accident
| that GNNs are popular mostly in domains like recommendations
| that feel graph-y for their domain model so getting to a useful
| topology isn't as big a leap.
|
| A frustration for me has been more that many of these graph-y
| domains are about behavioral machine/people data like logs that
| contain a large amount of categorical dimensions. The graph
| part can help, but it is just as import to key on the
| categorical dimensions, and doing well there often end up
| outside of the model - random forest etc. It's easier to start
| with those, and then is a lot of work for the GNN part (though
| we & others have been trying to simplify) for "a bit more
| lift".
|
| Of course, if this is your core business and this means many
| millions of dollars, it can be justified... but still, it's
| hard for most production teams. In practice, we often just do
| something with pygraphistry users like xgboost + umap and move
| on. Even getting an RGCN to perform well takes work..
| igorkraw wrote:
| GNNs are useful at least in one case, when your data a set of
| atoms that define your datum through their interactions,
| specifically a set that is that is high cardinality (so you
| can't YOLO it with attention) with some notion of neighbourhood
| (i.e. geometry) within your set (defined by the interactions)
| which if maintained makes the data permutation equivariant, BUT
| you can't find a meaningful way to represent that geometry
| implicitly (for example because it changes between samples) =>
| you YOLO it by just passing the neighourhood/interaction
| structure in as an input.
|
| In almost every other case, you can exploit additional
| structure to be more efficient (can you define an order?
| sequence model. is it euclidean/riemanian? CNN or manifold
| aware models. no need to have global state? pointcloud
| networks. you have an explicit hierarchy? Unet version of your
| underlying modality. etc)
|
| The reason why I find GNNs cool is that 1) they encode the very
| notion of _relations_ and 2) they have a very nice relationship
| to completely general discretized differential equations, which
| as a complex systems/dynamical systems guy is cool (but if you
| can specialize, there's again easier ways)
| energy123 wrote:
| Google's GraphCast is a GNN:
| https://deepmind.google/discover/blog/graphcast-ai-model-for...
| eperfa wrote:
| Google DeepMind's GenCast is based on diffusion:
| https://deepmind.google/discover/blog/gencast-predicts-
| weath...
|
| (Partially) Google Research's/DeepMind's NeuralGCM is based
| on hybrid models using ODEs and learnt physics:
| https://www.nature.com/articles/s41586-024-07744-y
|
| Microsoft Research's Aurora on vision transformers:
| https://www.microsoft.com/en-us/research/blog/introducing-
| au...
|
| Huawei's Pangu Weather is also a 3D transformer I believe
| https://www.nature.com/articles/s41586-024-07744-y
|
| I just wanted to highlight that there are multiple approaches
| in use for the same problem / in the same domain, and GNN
| does not seem to be the most widely used one.
| biomcgary wrote:
| I've followed GNNs in biology and applied them in a couple
| domains, but have been disappointed by the results so far. I'm
| a bit surprised because I have successfully used other graph-
| based approaches in biology.
| helltone wrote:
| It seems GNNs operate on a fixed topology. What if I want to
| approximate some transformation of the topology of the graph? For
| example learning how to layout a graph, or converting program
| abstract syntax trees to data flow graphs.
| igorkraw wrote:
| The whole point of GNNs is that they generalize to arbitrary
| topologies by explicitly conditioning the idea of "neighbours"
| on the graph specifying the topology. Graph layout has been
| tried here https://github.com/limbo018/DREAMPlace to great
| fanfare although there is recent drama about it
| https://www.semanticscholar.org/paper/The-False-Dawn%3A-Reev...
| . Graph transformations are a thing as well
| https://arxiv.org/abs/2012.01470 but it's a tricky problem
| because you implicitly need to solve the graph matching problem
| Xmd5a wrote:
| Graph layout is extremely interesting for infographics, since
| a handmade graph will almost always beat what tools such as
| graphviz can come up with (and I'm not even mentioning
| algorithms that go beyond Sugiyama's for which there is only
| a paper).
|
| Any progress on this front ?
| FjordWarden wrote:
| Maybe homology can help, it is a sort of calculus for discrete
| structures where you count how many N dimensional hole there
| are over time. Dunno about NN but that is what they can do with
| fMRI.
| leah_sun wrote:
| good share
| cherryteastain wrote:
| There are a lot of papers using GNNs for physics simulations
| (e.g. computational fluid dynamics) because the unstructured
| meshes used to discretize the problem domain for such
| applications map very neatly to a graph structure.
|
| In practice, every such mesh/graph is used once to solve a
| particular problem. Hence it makes little sense to train a GNN
| for a specific graph. However, that's exactly what most papers
| did because no one found a way to make a GNN that can adjust well
| to a different mesh/graph and different simulation parameters. I
| wonder if there's a breakthrough waiting just around the corner
| to make such a generalization possible.
| magicalhippo wrote:
| Naive question:
|
| Words in sentences kinda forms graphs, referencing other words
| or are leafs being referenced, both inside sentences and
| between sentences.
|
| Given the success of the attention mechanism in modern LLMs,
| how well would they do if you trained a LLM to process an
| actual graph?
|
| I guess you'd need some alternate tokenizer for optimal
| performance.
| cherryteastain wrote:
| For physics sims, I'd say it's useless.
|
| Imagine you discretize a cube into 1000 gridpoints in each
| direction, that's 1000^3 = 1 billion nodes/"tokens". Plus you
| typically time-march some sort of equation so you need the
| solutions previous 3-5 timesteps as well so that's 3-5
| billion tokens. If you are gonna do that in the first place,
| you may as well just use the traditional solver. Traditional
| solvers usually set up and solve a matrix equation like Ax=b
| with an iterative method like multigrid which is O(n) as
| opposed to transformer's O(n^2). It'll give you a much more
| accurate answer much quicker than it'll take a transformer to
| do attention on a sequence of length 3 billion.
|
| The entire point of using GNNs/CNNs in this field is that
| people rely on their ability to make inference using local
| information. That means the value at each gridpoint/node can
| be inferred from neighbouring nodes only, which is O(n) like
| multigrid. Idea in most papers is that the GNN can do this
| faster than multigrid. Results so far are mixed, however [1].
|
| [1] https://arxiv.org/abs/2407.07218
| magicalhippo wrote:
| Ah yes, for dense problems like that I wouldn't expect it
| to work well. The example graphs in the submission were
| mostly quite sparse, hence why I thought of LLMs. But
| perhaps that was just for illustrative purposes.
| disattention wrote:
| This is actually a good insight. It turns out that
| transformers are indeed a form of graph network, precisely
| because of the attention mechanism. Graph attention networks
| are actually a very popular GNN architecture. Generally, the
| issue with using an LLM style architecture for generic graphs
| is modeling the sparsity, but is possible by using the graph
| adjacency matrix to mask the attention matrix. There are a
| number of papers and articles which address this connection,
| and plenty of research into mechanisms for sparsifying
| attention in transformers.
|
| There are also graph tokenizers for using more standard
| transformers on graphs for doing things like classification,
| generation, and community detection.
| algo_trader wrote:
| Any canonical papers on GNN for code graphs?
| whimsicalism wrote:
| yep you're now pretty much at state-of-the-art
| 6gvONxR4sf7o wrote:
| That's the kind of thing that I could imagine could
| dramatically speed up certain tasks, but not enable
| particularly new abilities. A ton of the challenge is
| converting a sequence into a graph. So if you need a huge
| clever model to turn a sequence into a graph, then your
| potential gain downstream is either a) in making certain
| computationally hard queries easier, or b) answering tons and
| tons of followup queries dramatically faster (enough to make
| the initial graphification overhead okay).
|
| For (a), any imperfections in the graphification make the
| problem super hard and researchy.
| zmgsabst wrote:
| A general graph solver has to be a general intelligence, since
| it would be able to successfully model category theory.
| openrisk wrote:
| Very high quality work, its a pity that distill.pub did not find
| a sustainable way forward [1].
|
| On GNN's, the lack of datasets [2] might be a reason they are not
| as talked about. This is something that has affected also the
| semantic web domain.
|
| [1] https://distill.pub/2021/distill-hiatus/
|
| [2]
| https://huggingface.co/datasets?task_categories=task_categor...
| siver_john wrote:
| Honestly, seeing this on the front page had my hopes up they
| had found out a way forward but alas.
| wenc wrote:
| My personal hack is that before jumping into papers in
| unfamiliar areas like these, I go on YouTube and search for
| "<topic> explained".
|
| The power of YouTube is that for any hot area, there are a
| bunch of people incentivized to make a short video that
| maximally engages. The quality can be quite high even at higher
| levels of math abstractions. The visuals are really helpful to
| get a feel for the abstractions (3 blue 1 brown proved this
| years ago).
|
| There are some excellent videos of GNNS that in less than 10
| mins gave me a launching point into the literature.
| esafak wrote:
| (2021)
| memhole wrote:
| Would love to see distill come back
| ziofill wrote:
| It's very sad distill.pub doesn't accept new submissions... :/
| eachro wrote:
| Is there consensus about whether gnn architectures are better
| than transformer based ones at this point? I am aware that
| transformers can be viewed as a gnn too.
| Legend2440 wrote:
| GNNs let you build some of the structure of your problem into
| the network architecture. This can be useful if your problem
| has a well-understood structure, like physical simulations.
|
| However in many cases we do not know the structure of our
| problem (that's why we want to use ML in the first place) and
| in these cases GNNs do not beat transformers.
| hi_hi wrote:
| I feel very dumb. There is an example on that page with 4 nodes
| (a,b,c,d) and it shows a total of 24 possible combinations.
|
| What is the generalised formula for calculating this, given the
| number of nodes but also edges need to be considered.
|
| It doesn't appear to be explained in the article. I think it may
| be a factorial?
___________________________________________________________________
(page generated 2024-12-20 23:00 UTC)