[HN Gopher] Graph Neural Networks use graphs when they shouldn't
       ___________________________________________________________________
        
       Graph Neural Networks use graphs when they shouldn't
        
       Author : Pseudomanifold
       Score  : 102 points
       Date   : 2023-09-19 15:40 UTC (7 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | ajb117 wrote:
       | Besides the the title being a bit exaggerated, it's been long-
       | acknowledged in the GNN community that graph rewiring often helps
       | learning; see [1, 2] for example. Also, at first skim, I'm
       | surprised there's no discussion on oversmoothing/oversquashing in
       | this paper. It seems like what they're calling "overfitting" on
       | the graph structure could be interpreted as the GNN's node
       | representations saturating/starving because of poor graph
       | topology.
       | 
       | [1] - https://arxiv.org/abs/2111.14522 [2] -
       | https://arxiv.org/abs/2210.02997
        
       | dpflan wrote:
       | Anyone working with GNNs have an experience with this? Has this
       | affected your work?
        
         | cmavvv wrote:
         | Yes, and no. I just completed a PhD on GNNs, and in the last 4
         | years I have not used a single GNN without residual connection.
         | It is quite well known that res connections allow to alleviate
         | a lot of issues such as oversmoothing, and I expect that it
         | would be the case here as well.
        
         | dauertewigkeit wrote:
         | In my use case I care not one bit about the overall graph
         | structure, but only about the inter-node
         | interactions/relationships. But often times you will get graphs
         | where you have 1% type A interactions, 12% type B interactions
         | and 87% type C interactions and that makes training quite
         | challenging. But I am not sure if you can stay that that is
         | over-fitting the graph structure rather than simply learning
         | the most common interactions in the dataset.
        
           | lmeyerov wrote:
           | the paper is on my queue, but yeah, these kind of class
           | imbalance & class dependency problems are why things like
           | RGCNs and negative sampling are so popular. It's almost the
           | typical case in cyber, fraud, supply chain, social, etc.
           | 
           | As others posted here, ideas like metapaths, transformer
           | equivalence, etc are common building blocks in the GNN world
           | because of these kinds of things. Important areas so def
           | curious.
        
         | stefanpie wrote:
         | I've work with GNNs. I have not explicitly looked into this,
         | but in general, the dynamics of GNNs are really interesting and
         | there are many different lenses in which one can study the
         | behavior of GNNs and find interesting takeaways.
         | 
         | - Explicit Message Passing: https://arxiv.org/abs/2202.11097
         | 
         | - Algorithmic Reasoning: https://arxiv.org/abs/2203.15544
         | 
         | - Diffusion:
         | https://proceedings.mlr.press/v139/chamberlain21a.html
         | 
         | - Spectral: https://arxiv.org/abs/2010.02863
         | 
         | - Sparsity: https://arxiv.org/abs/2211.15335
         | 
         | - Training Tricks: https://arxiv.org/abs/2108.10521
         | 
         | - Interaction Networks / Physical Models:
         | https://arxiv.org/pdf/1612.00222.pdf
         | 
         | - Expressive Power: https://arxiv.org/abs/2308.08235
         | 
         | - "Over-Squashing": https://arxiv.org/abs/2111.14522
        
         | pama wrote:
         | This is a personal impressionistic opinion, so please take it
         | as such and don't expect citations or details. It has been
         | empirically clear since the inception of GNN that the message
         | passing structure and forced equivariance would make it hard to
         | optimize GNNs for many practical applications. In a sense if
         | you could solve graph isomorphism with these architectures, you
         | would find a counter example for an NP-complete problem, so few
         | people expected magic, but it was still disheartening to see
         | the old school hashed bags of subgraphs outperform fancy GNNs
         | on practical problems and to see that pretraining didn't help
         | the GNNs as much, or to see that the internal structures of
         | these models are not quickly building up sensible hierarchies
         | on their own. So then people started the hacking and exploring
         | and by now the practical models are better than the early hacks
         | but the theoretical optimization for how to deal with graphs in
         | subdomains of extreme interest (say drug discovery, biology, or
         | materials design) has not yet finished. By comparison
         | autoregressive language models are super mature and generally
         | applicable, as are transformers. These mixups in the early GNN
         | led people to spend a lot of effort reformulating problems to
         | match established architectures better or complement the GNN
         | parts with hacks.
        
         | oersted wrote:
         | I haven't worked on GNNs specifically, but I've done academic
         | research on other architectures, and I used MirrorThink.ai to
         | perform a comprehensive literature review. Hope it gives some
         | useful context.
         | 
         | ---
         | 
         | The paper "Graph Neural Networks Tend to Overfit the Graph
         | Structure" provides a significant contribution to the field of
         | Graph Neural Networks (GNNs) by highlighting the issue of
         | overfitting to the graph structure. This overfitting phenomenon
         | can lead to reduced performance, especially when the graph
         | structure is non-informative or irrelevant to the task at hand.
         | To address this issue, the authors propose a graph-editing
         | method called R-COV, which aims to reduce the coefficient of
         | variation (COV) of a given graph, making it more similar to
         | regular graphs.
         | 
         | This research aligns with other studies in the field that have
         | also addressed the issue of overfitting in GNNs. For instance,
         | the paper "Overfitting Mechanism and Avoidance in Deep Neural
         | Networks" discusses the problem of overfitting in deep neural
         | networks and proposes a consensus-based classification
         | algorithm to address this issue. This algorithm identifies
         | consistently classified samples and rejects samples that are
         | classified randomly, which can help avoid overfitting even with
         | a small number of training samples.
         | 
         | Another relevant study is "Neighborhood Random Walk Graph
         | Sampling for Regularized Bayesian Graph Convolutional Neural
         | Networks," which proposes a Bayesian Graph Convolutional Neural
         | Network (BGCN) that accounts for uncertainty in the graph
         | structure and reduces overfitting. The BGCN model utilizes a
         | neighborhood random walk-based graph sampling method to sample
         | nodes from the observed graph, improving sampling efficiency
         | and diversifying connections between nodes.
         | 
         | The paper "Training Graph Neural Networks by Graphon
         | Estimation" also proposes a method for training GNNs that
         | mitigates the issues of overfitting and over-smoothing. This
         | method involves resampling the adjacency matrix of the graph
         | from a graphon estimate obtained from the underlying network
         | data. By resampling the adjacency matrix, the method provides
         | data augmentation and introduces randomness to the GNN training
         | process, which helps alleviate overfitting and over-smoothing.
         | 
         | Finally, the paper "Eigen-GNN: A Graph Structure Preserving
         | Plug-in for GNNs" proposes the Eigen-GNN module to enhance
         | GNNs' ability to preserve graph structures by integrating the
         | eigenspace of graph structures with GNNs. This approach
         | significantly improves the preservation of graph structures and
         | consistently outperforms existing GNNs in structure-driven
         | tasks.
        
       | tbalsam wrote:
       | I'm curious how this relates to MDL-tied regularization. I don't
       | see any explicit reference to the L2 norm here, and I'm not too
       | smart about graph neural networks.
       | 
       | But it seems like overfitting is almost necessary unless some
       | kind of obscure variational technique is used? They did have one
       | paper where they used a progressively compressed GNN to isolate
       | an approximation for a cosmological equation smaller and more
       | accurate than had been previously achieved. I'll update here if I
       | find it.
       | 
       | Edit: Ah, here we are: https://arxiv.org/abs/2006.11287 I believe
       | there was some very good followup work posted on it this year
       | that I saw some movement during the 3-5 months that I was really
       | active on Twitter (brief time that it was), seems like an
       | extremely promising area, and one I should learn if I truly do
       | care about the Kolmogorov complexity of a problem.
        
       | eachro wrote:
       | Isn't the natural solution to just use attention layers instead
       | of graph convolution layers then? Then there the attention
       | mechanism will end up learning the useful graph structure via the
       | attention weights?
        
         | solomatov wrote:
         | It's more or less the same thing. You could consider
         | transformer a GNN.
        
           | ftxbro wrote:
           | in fact everything is a graph
        
             | retbull wrote:
             | Well the whole point of this paper is NOT everything is a
             | graph.
        
               | dekhn wrote:
               | in fact everything is a graph
               | 
               | really
        
         | SpaceManNabs wrote:
         | transformers are GNNs where all nodes are connected to all
         | nodes (well in the decoder you have masks but you see my
         | point).
         | 
         | If the problem is that the graph is not sparse enough / not a
         | graph at all, adding more connections doesn't help.
         | 
         | edit: Why doesn't it help? They address this in the paper. 1.
         | There is computational infeasiability problem. 2. The
         | transformer decoder can't be a regular graph if you include
         | masking.
        
           | eachro wrote:
           | If they don't actually help then the attention weight for
           | that connection would tend to 0, right? Then it becomes a
           | problem of overfitting which we have a large arsenal to
           | combat.
        
             | SpaceManNabs wrote:
             | > If they don't actually help then the attention weight for
             | that connection would tend to 0, right?
             | 
             | Not necessarily. That is what this paper is about. From
             | what I understand, they also consider graph attention
             | networks.
        
       ___________________________________________________________________
       (page generated 2023-09-19 23:00 UTC)