[HN Gopher] NAFNet: Nonlinear Activation Free Network for Image ...
___________________________________________________________________
NAFNet: Nonlinear Activation Free Network for Image Restoration
Author : pizza
Score : 41 points
Date : 2022-08-04 15:54 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| liuliu wrote:
| To people who are as confused as me. The "nonlinear activation
| free" doesn't mean "linear" in the paper (otherwise this is going
| to be ground-breaking discovery). They use polynomial functions
| in place of traditionally GELU or ReLU gating, or sigmoid (for
| attention module). By replaced with simply "x^2" with some bells
| and whistles, it seems get some good results.
|
| If I am being pedantic, "x^2" are still nonlinear though ...
| a1369209993 wrote:
| I _think_ the idea is that the _derivative_ of x^2 is linear
| (namely 2x), which makes the training /backpropagation very
| fast since it can evaluate 2x much faster than dsigma(x)/dx,
| dGELU(x)/dx, etc, and also speeds up evaluation since x*x is
| faster than sigma(x) or cetera. But if so they didn't explain
| it well or possibly at all. And "nonlinear activation" is still
| nonsense.
| igorkraw wrote:
| >If I am being pedantic, "x^2" are still nonlinear though
|
| yes, and this is where this paper should have been stopped,
| with no ill intention towards the authors.
|
| Polynomial networks have gotten good results (my colleague
| https://scholar.google.com/citations?user=1bU041kAAAAJ has done
| extensive work on them) and there have been multiple papers
| studying multiplicative interactions and the effects of feature
| engineering, dozens of small tweaks on activation functions,
| not to start with the NAS papers automating the whole process.
| But higher numbers get a paper in I guess
| civilized wrote:
| So, when they say "free of nonlinear activations", what they
| mean is "uses nonlinear activations other than the ones most
| commonly used"?
|
| Reminds me of when Salesforce replaced on-prem software with
| cloud software and declared it "the end of software".
| mkaic wrote:
| This is fascinating because one of the first things they teach
| you in ML class is why nonlinear activations are necessary --
| because otherwise, your entire network is mathematically
| equivalent to a single linear transformation since linear
| transformations are composable! I'm going to have to read
| through this paper because I'm really curious if they posit any
| theories as to _why_ removing nonlinear transforms in their
| model had such a positive effect in this particular scenario.
| idontpost wrote:
| But ... they didn't remove nonlinear transforms. They just
| used a different one and gave it a stupid name.
___________________________________________________________________
(page generated 2022-08-04 23:01 UTC)