[HN Gopher] Neural Network Loss Landscapes: What do we know? (2021)
___________________________________________________________________
Neural Network Loss Landscapes: What do we know? (2021)
Author : bitforger
Score : 22 points
Date : 2022-07-17 20:55 UTC (2 hours ago)
(HTM) web link (damueller.com)
(TXT) w3m dump (damueller.com)
| charleshmartin wrote:
| https://calculatedcontent.com/2015/03/25/why-does-deep-learn...
| evolvingstuff wrote:
| Here are some "animated" loss landscapes I made quite a long time
| ago:
|
| http://evolvingstuff.blogspot.com/2011/02/animated-fractal-f...
|
| These are related to recurrent neural networks evolved to
| maximize fitness whilst wandering through a randomly generated
| maze and picking up food pellets (the advantage being to remember
| not to revisit where you have already been.)
| MauranKilom wrote:
| The "wedge" part under "3. Mode Connectivity" has at least one
| obvious component: Neural networks tend to be invariant to
| permuting nodes (together with their connections) within a layer.
| Simply put, it doesn't matter in what order you number the K
| nodes of e.g. a fully connected layer, but that alone already
| means there are K! different solutions with exactly the same
| behavior. Equivalently, the loss landscape is symmetric to
| certain permutations of its dimensions.
|
| This means that, at the very least, there are _many_ global
| optima (well, unless all permutable weights end up with the same
| value, which is obviously not the case). The fact that different
| initializations /early training steps can end up in different but
| equivalent optima follows directly from this symmetry. But
| whether all their basins are connected, or whether there are just
| multiple equivalent basins, is much less clear. The "non-linear"
| connection stuff does seem to imply that they are all in some
| (high-dimensional, non-linear) valley.
|
| To be clear, this is just me looking at these results from the
| "permutation" perspective above, because it leads to a few
| obvious conclusions. But I am not qualified to judge which of
| these results are more or less profound.
| evolvingstuff wrote:
| Completely agree! Plus, less trivially, there can be a bunch of
| different link weight settings (for an assumed distribution of
| inputs) that result in nearly-symmetric behaviors, and then
| that is multiplied by the permutation results you have just
| mentioned! So, it's complicated...
___________________________________________________________________
(page generated 2022-07-17 23:00 UTC)