[HN Gopher] Do wide and deep networks learn the same things?
___________________________________________________________________
Do wide and deep networks learn the same things?
Author : MindGods
Score : 82 points
Date : 2021-06-02 16:27 UTC (6 hours ago)
(HTM) web link (ai.googleblog.com)
(TXT) w3m dump (ai.googleblog.com)
| joe_the_user wrote:
| So, it seems like the "blocks" they're talking about are
| basically redundancies, duplicated logic. It makes sense to me
| that since they provide the same functionality, how or how these
| duplicates exist doesn't matter. But I'm an amateur
| godelski wrote:
| For more context to people, we have the universal approximation
| theorem for neural nets that basically says if a network is wide
| enough it can approximate anything (with at least 2 layers). So a
| lot of stuff was really wide. Then VGG[0] came out and showed
| that deep networks were very effective (along with other papers,
| things happen in unison. Leibniz and Newton). Then you get
| ResNets[1] with skip connections and move forward to today. Today
| we've started looking more at what networks are doing and where
| their biases lie. This is because we're running into some
| roadblocks with CNNs vs Transformers. They have different
| inductive biases. Vision transformers still aren't defeating
| CNNs, but they are close and it is clear they learn different
| things. So we're seeing more papers doing these types of
| analyses. ML will likely never be fully interpretable, but we're
| getting better at understanding. This is good because a lot of
| times picking your model and network architecture is more art
| than science (especially when choosing hyper parameters).
|
| [0] https://arxiv.org/abs/1409.1556
|
| [1] https://arxiv.org/abs/1512.03385
| dimatura wrote:
| I would say AlexNet [1], rather than VGG, was the landmark
| paper that got the computer vision community to pay attention
| to deep learning, specifically by winning the 2012 imagenet
| competition by a large margin. Not that there weren't successes
| before (specifically, deep nets had also been getting traction
| in speech processing), and of course deep learning itself is
| much older than alexnet. But I think most people, specially in
| vision, would say the 2012 imagenet competition was the
| watershed moment for DL. By current standards it's not very
| deep, but at the time it was definitely "deeper"than the
| popular models at the time (which were mostly not neural
| networks).
|
| VGG is also super influential of course -- it reinforced the
| trend towards ever deeper networks, which ResNet also took to
| another level.
|
| [1] https://papers.nips.cc/paper/4824-imagenet-classification-
| wi...
| joe_the_user wrote:
| Deep networks have also been shown to be universal, just FYI,
| sova wrote:
| At first I thought this had something to do with the classic
| "breadth vs. depth" notion on learning stuff -- if you're
| preparing for the MCAT it is better to have breadth that covers
| all the topics than depth in one or two particulars for the exam,
| but this is actually just about the dimensions of the neural
| network used to create representations. Naturally, one would
| expect a "sweet spot" or series of "sweet spots."
|
| From the paper at https://arxiv.org/pdf/2010.15327.pdf
|
| > As the model gets wider or deeper, we seethe emergence of a
| distinctive block structure-- a considerable range of hidden
| layers that have very high representation similarity (seen as a
| yellow square on the heatmap). This block structure mostly
| appears in the later layers (the last two stages) of the network.
|
| I wonder if we could do similar analysis on the human brain and
| find "high representational similarity" for people who do the
| same task over and over again, such as play chess.
|
| Also, I don't really know what sort of data they are analyzing or
| looking at with these NN, maybe someone with better scansion can
| let me know?
| andersource wrote:
| Haven't read thoroughly but it seems they are investigating
| ResNet models [0] trained for image classification.
|
| > We apply CKA to a family of ResNets of varying depths and
| widths, trained on common benchmark datasets (CIFAR-10,
| CIFAR-100 and ImageNet)
|
| [0] https://arxiv.org/abs/1512.03385
| verdverm wrote:
| iirc, the human neocortex is only 6 layers deep with some
| interesting vertical connection structures, perhaps similar to
| skip connections in NN.
|
| It would be interesting to see where the deep vs wide analysis
| ends up when many problem types are used. Can a single network
| be trained on multiple problems at once and perform well on
| all?
| ubercore wrote:
| That sounds fascinating. When talking about something as
| complex and interconnected as a human neocortex, what does
| "only 6 layers deep" mean?
| mattkrause wrote:
| If you look at a slice of cortex under the microscope,
| there appear to be six physical layers (like a cake), owing
| to the different types, numbers, and arrangement of neurons
| in each.
|
| Canonically, the cortex is built out of columns, each of
| which repeat the same motif. Within a cortical column,
| signals enter a cortical region through layer IV, 'ascend'
| to other cortical areas via Layers II and III, and project
| elsewhere in the brain via Layer V/VI. Layer I mostly
| contains passing fibers going elsewhere. There are also
| "horizontal" or lateral connections between and within
| columns.
|
| This is sort of an abstraction though. It's often hard to
| clearly delineate the boundary between Layer II and III.
| Layer IV of primary visual cortex has many small sublayers
| (4C alpha), but it's very very small in others.
| Buttons840 wrote:
| I'm not sure what skip connections are, but I think I have a
| good guess as to what they are.
|
| I've wanted to try a neural network where the output of every
| layer goes into every subsequent layer. Each layer would thus
| provide a different perspective to subsequent layers.
|
| Anyone know if this has been tried?
| verdverm wrote:
| Have a look at AlphaStar, it's a pretty interesting network
| of networks that has some skip functionality.
|
| The DeepMind lecture series on YouTube is pretty great.
|
| You'd likely overdue it with skips everywhere, too many
| connections to learn and backprop on that learning would
| likely be difficult
| nomad225 wrote:
| Densely Connected Convolutional Networks
| (https://arxiv.org/abs/1608.06993) use the idea you're
| talking about quite effectively.
| Buttons840 wrote:
| Yeah. That seems like exactly what I was describing.
| Thanks.
| dimatura wrote:
| There's a lot of differences between the mainstream CNNs and
| biological NNs though. For one, inference in most CNNs is
| just feed-forward, whereas in the brain the information flow
| is a lot more complex, modulated by all sorts of feedback
| loops as well as a dynamic state given by memory,
| expectation, attention, etc. Biological neurons are also a
| lot more complicated (and diverse) than the artificial ones.
| So those six layers aren't really very comparable to six
| layers in a typical modern CNN.
|
| Of course, I'm talking about just the typical current day
| CNN. There's a lot of ongoing work in recurrent neural nets,
| memory for neural nets, attention (though the idea of
| "attention" that is hot right now is quite simplified
| compared to what we usually call attention), etc.
| mattkrause wrote:
| Each cortical _area_ has six layers, but most behaviors
| require interactions between many cortical areas, so "input"
| passes through many more than six layers before it produces
| an output.
|
| Felleman and Van Essen is a classic paper on the organization
| of the visual system. Figure 2 (p. 4) might give you good
| sense for how much of the brain it occupies and Figure 4 (p.
| 30) is the well-known "wiring diagram.
|
| In the 30 years since that paper was written, we've found a
| few more boxes and a lot more wires! We've also come to
| appreciate that there are lots of recurrent loops. V1 -> V2
| is one of the biggest connections in the brain, V2 -> V1 is a
| near runner up.
|
| https://cogsci.ucsd.edu/~sereno/201/readings/04.03-MacaqueAr.
| ..
| verdverm wrote:
| Indeed, there are many "horizontal" and recurrent
| connections and simply thinking of it as a 6 layer feed
| forward network is a gross oversimplification. It's more
| like a complex network of complex networks... of the
| spiking variety
| mattkrause wrote:
| Yup, and that's just the "classical" synaptic
| transmission.
|
| Mixed in with that, there is also slower signaling via
| neuromodulators (dopamine, norepinephrine etc),
| neuroendocrine system, and God only knows whatever the
| astrocytes are doing. Every neuron has its own internal
| dynamics too, over scales ranging from milliseconds
| (channel inactivation) to hours or days (receptor
| internalization).
|
| There's even the possibility of "ephaptic coupling",
| wherein the electric fields produced by some neurons
| affect the activity of others, without making any sort of
| direct contact. We've collected some of the stronger data
| in favor of that possibility and yet I remain firmly in
| denial because it would make the brain so absurdly
| complicated.
| rajansaini wrote:
| Those are very interesting empirical results. This lecture
| explains the deeper vs shallow tradeoff theoretically:
| https://www.youtube.com/watch?v=qpuLxXrHQB4. He's an amazing
| lecturer; wish I didn't need subtitles!
|
| (If you're too lazy to watch, it turns out that there exist
| functions that a shallow network can never approximate)
___________________________________________________________________
(page generated 2021-06-02 23:01 UTC)