[HN Gopher] NAFNet: Nonlinear Activation Free Network for Image ...
       ___________________________________________________________________
        
       NAFNet: Nonlinear Activation Free Network for Image Restoration
        
       Author : pizza
       Score  : 41 points
       Date   : 2022-08-04 15:54 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | liuliu wrote:
       | To people who are as confused as me. The "nonlinear activation
       | free" doesn't mean "linear" in the paper (otherwise this is going
       | to be ground-breaking discovery). They use polynomial functions
       | in place of traditionally GELU or ReLU gating, or sigmoid (for
       | attention module). By replaced with simply "x^2" with some bells
       | and whistles, it seems get some good results.
       | 
       | If I am being pedantic, "x^2" are still nonlinear though ...
        
         | a1369209993 wrote:
         | I _think_ the idea is that the _derivative_ of x^2 is linear
         | (namely 2x), which makes the training /backpropagation very
         | fast since it can evaluate 2x much faster than dsigma(x)/dx,
         | dGELU(x)/dx, etc, and also speeds up evaluation since x*x is
         | faster than sigma(x) or cetera. But if so they didn't explain
         | it well or possibly at all. And "nonlinear activation" is still
         | nonsense.
        
         | igorkraw wrote:
         | >If I am being pedantic, "x^2" are still nonlinear though
         | 
         | yes, and this is where this paper should have been stopped,
         | with no ill intention towards the authors.
         | 
         | Polynomial networks have gotten good results (my colleague
         | https://scholar.google.com/citations?user=1bU041kAAAAJ has done
         | extensive work on them) and there have been multiple papers
         | studying multiplicative interactions and the effects of feature
         | engineering, dozens of small tweaks on activation functions,
         | not to start with the NAS papers automating the whole process.
         | But higher numbers get a paper in I guess
        
         | civilized wrote:
         | So, when they say "free of nonlinear activations", what they
         | mean is "uses nonlinear activations other than the ones most
         | commonly used"?
         | 
         | Reminds me of when Salesforce replaced on-prem software with
         | cloud software and declared it "the end of software".
        
         | mkaic wrote:
         | This is fascinating because one of the first things they teach
         | you in ML class is why nonlinear activations are necessary --
         | because otherwise, your entire network is mathematically
         | equivalent to a single linear transformation since linear
         | transformations are composable! I'm going to have to read
         | through this paper because I'm really curious if they posit any
         | theories as to _why_ removing nonlinear transforms in their
         | model had such a positive effect in this particular scenario.
        
           | idontpost wrote:
           | But ... they didn't remove nonlinear transforms. They just
           | used a different one and gave it a stupid name.
        
       ___________________________________________________________________
       (page generated 2022-08-04 23:01 UTC)