[HN Gopher] Graph Neural Networks use graphs when they shouldn't
___________________________________________________________________
Graph Neural Networks use graphs when they shouldn't
Author : Pseudomanifold
Score : 102 points
Date : 2023-09-19 15:40 UTC (7 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| ajb117 wrote:
| Besides the the title being a bit exaggerated, it's been long-
| acknowledged in the GNN community that graph rewiring often helps
| learning; see [1, 2] for example. Also, at first skim, I'm
| surprised there's no discussion on oversmoothing/oversquashing in
| this paper. It seems like what they're calling "overfitting" on
| the graph structure could be interpreted as the GNN's node
| representations saturating/starving because of poor graph
| topology.
|
| [1] - https://arxiv.org/abs/2111.14522 [2] -
| https://arxiv.org/abs/2210.02997
| dpflan wrote:
| Anyone working with GNNs have an experience with this? Has this
| affected your work?
| cmavvv wrote:
| Yes, and no. I just completed a PhD on GNNs, and in the last 4
| years I have not used a single GNN without residual connection.
| It is quite well known that res connections allow to alleviate
| a lot of issues such as oversmoothing, and I expect that it
| would be the case here as well.
| dauertewigkeit wrote:
| In my use case I care not one bit about the overall graph
| structure, but only about the inter-node
| interactions/relationships. But often times you will get graphs
| where you have 1% type A interactions, 12% type B interactions
| and 87% type C interactions and that makes training quite
| challenging. But I am not sure if you can stay that that is
| over-fitting the graph structure rather than simply learning
| the most common interactions in the dataset.
| lmeyerov wrote:
| the paper is on my queue, but yeah, these kind of class
| imbalance & class dependency problems are why things like
| RGCNs and negative sampling are so popular. It's almost the
| typical case in cyber, fraud, supply chain, social, etc.
|
| As others posted here, ideas like metapaths, transformer
| equivalence, etc are common building blocks in the GNN world
| because of these kinds of things. Important areas so def
| curious.
| stefanpie wrote:
| I've work with GNNs. I have not explicitly looked into this,
| but in general, the dynamics of GNNs are really interesting and
| there are many different lenses in which one can study the
| behavior of GNNs and find interesting takeaways.
|
| - Explicit Message Passing: https://arxiv.org/abs/2202.11097
|
| - Algorithmic Reasoning: https://arxiv.org/abs/2203.15544
|
| - Diffusion:
| https://proceedings.mlr.press/v139/chamberlain21a.html
|
| - Spectral: https://arxiv.org/abs/2010.02863
|
| - Sparsity: https://arxiv.org/abs/2211.15335
|
| - Training Tricks: https://arxiv.org/abs/2108.10521
|
| - Interaction Networks / Physical Models:
| https://arxiv.org/pdf/1612.00222.pdf
|
| - Expressive Power: https://arxiv.org/abs/2308.08235
|
| - "Over-Squashing": https://arxiv.org/abs/2111.14522
| pama wrote:
| This is a personal impressionistic opinion, so please take it
| as such and don't expect citations or details. It has been
| empirically clear since the inception of GNN that the message
| passing structure and forced equivariance would make it hard to
| optimize GNNs for many practical applications. In a sense if
| you could solve graph isomorphism with these architectures, you
| would find a counter example for an NP-complete problem, so few
| people expected magic, but it was still disheartening to see
| the old school hashed bags of subgraphs outperform fancy GNNs
| on practical problems and to see that pretraining didn't help
| the GNNs as much, or to see that the internal structures of
| these models are not quickly building up sensible hierarchies
| on their own. So then people started the hacking and exploring
| and by now the practical models are better than the early hacks
| but the theoretical optimization for how to deal with graphs in
| subdomains of extreme interest (say drug discovery, biology, or
| materials design) has not yet finished. By comparison
| autoregressive language models are super mature and generally
| applicable, as are transformers. These mixups in the early GNN
| led people to spend a lot of effort reformulating problems to
| match established architectures better or complement the GNN
| parts with hacks.
| oersted wrote:
| I haven't worked on GNNs specifically, but I've done academic
| research on other architectures, and I used MirrorThink.ai to
| perform a comprehensive literature review. Hope it gives some
| useful context.
|
| ---
|
| The paper "Graph Neural Networks Tend to Overfit the Graph
| Structure" provides a significant contribution to the field of
| Graph Neural Networks (GNNs) by highlighting the issue of
| overfitting to the graph structure. This overfitting phenomenon
| can lead to reduced performance, especially when the graph
| structure is non-informative or irrelevant to the task at hand.
| To address this issue, the authors propose a graph-editing
| method called R-COV, which aims to reduce the coefficient of
| variation (COV) of a given graph, making it more similar to
| regular graphs.
|
| This research aligns with other studies in the field that have
| also addressed the issue of overfitting in GNNs. For instance,
| the paper "Overfitting Mechanism and Avoidance in Deep Neural
| Networks" discusses the problem of overfitting in deep neural
| networks and proposes a consensus-based classification
| algorithm to address this issue. This algorithm identifies
| consistently classified samples and rejects samples that are
| classified randomly, which can help avoid overfitting even with
| a small number of training samples.
|
| Another relevant study is "Neighborhood Random Walk Graph
| Sampling for Regularized Bayesian Graph Convolutional Neural
| Networks," which proposes a Bayesian Graph Convolutional Neural
| Network (BGCN) that accounts for uncertainty in the graph
| structure and reduces overfitting. The BGCN model utilizes a
| neighborhood random walk-based graph sampling method to sample
| nodes from the observed graph, improving sampling efficiency
| and diversifying connections between nodes.
|
| The paper "Training Graph Neural Networks by Graphon
| Estimation" also proposes a method for training GNNs that
| mitigates the issues of overfitting and over-smoothing. This
| method involves resampling the adjacency matrix of the graph
| from a graphon estimate obtained from the underlying network
| data. By resampling the adjacency matrix, the method provides
| data augmentation and introduces randomness to the GNN training
| process, which helps alleviate overfitting and over-smoothing.
|
| Finally, the paper "Eigen-GNN: A Graph Structure Preserving
| Plug-in for GNNs" proposes the Eigen-GNN module to enhance
| GNNs' ability to preserve graph structures by integrating the
| eigenspace of graph structures with GNNs. This approach
| significantly improves the preservation of graph structures and
| consistently outperforms existing GNNs in structure-driven
| tasks.
| tbalsam wrote:
| I'm curious how this relates to MDL-tied regularization. I don't
| see any explicit reference to the L2 norm here, and I'm not too
| smart about graph neural networks.
|
| But it seems like overfitting is almost necessary unless some
| kind of obscure variational technique is used? They did have one
| paper where they used a progressively compressed GNN to isolate
| an approximation for a cosmological equation smaller and more
| accurate than had been previously achieved. I'll update here if I
| find it.
|
| Edit: Ah, here we are: https://arxiv.org/abs/2006.11287 I believe
| there was some very good followup work posted on it this year
| that I saw some movement during the 3-5 months that I was really
| active on Twitter (brief time that it was), seems like an
| extremely promising area, and one I should learn if I truly do
| care about the Kolmogorov complexity of a problem.
| eachro wrote:
| Isn't the natural solution to just use attention layers instead
| of graph convolution layers then? Then there the attention
| mechanism will end up learning the useful graph structure via the
| attention weights?
| solomatov wrote:
| It's more or less the same thing. You could consider
| transformer a GNN.
| ftxbro wrote:
| in fact everything is a graph
| retbull wrote:
| Well the whole point of this paper is NOT everything is a
| graph.
| dekhn wrote:
| in fact everything is a graph
|
| really
| SpaceManNabs wrote:
| transformers are GNNs where all nodes are connected to all
| nodes (well in the decoder you have masks but you see my
| point).
|
| If the problem is that the graph is not sparse enough / not a
| graph at all, adding more connections doesn't help.
|
| edit: Why doesn't it help? They address this in the paper. 1.
| There is computational infeasiability problem. 2. The
| transformer decoder can't be a regular graph if you include
| masking.
| eachro wrote:
| If they don't actually help then the attention weight for
| that connection would tend to 0, right? Then it becomes a
| problem of overfitting which we have a large arsenal to
| combat.
| SpaceManNabs wrote:
| > If they don't actually help then the attention weight for
| that connection would tend to 0, right?
|
| Not necessarily. That is what this paper is about. From
| what I understand, they also consider graph attention
| networks.
___________________________________________________________________
(page generated 2023-09-19 23:00 UTC)