[HN Gopher] Don't Mess with Backprop: Doubts about Biologically ...
___________________________________________________________________
Don't Mess with Backprop: Doubts about Biologically Plausible Deep
Learning
Author : ericjang
Score : 53 points
Date : 2021-02-13 22:01 UTC (2 days ago)
(HTM) web link (blog.evjang.com)
(TXT) w3m dump (blog.evjang.com)
| monocasa wrote:
| I've never understood why biological neural nets would need back
| prop.
|
| Evolutionary pressure is it's own applied loss function. It's
| less efficient than back prop, but gets you to solutions all the
| same.
| visarga wrote:
| Evolution works on different time scales from day to day life.
| It's an outer loop of evolution with an inner loop of
| optimization (learning).
| monocasa wrote:
| But that day to day information doesn't need to be stored as
| weights in the network in cyclic networks like you see in
| biology. It can be stored in fluctuations of the data
| oscillating around, with the individual weights not really
| changing. Sort of like how your CPU doesn't change the linear
| region of it's transistors to perform new tasks.
| pizza wrote:
| If you are interested in deep learning with spiking neural
| networks there is also the norse framework:
| https://github.com/electronicvisions/norse
| orbifold wrote:
| That repo is slightly outdated, development now continues at
| https://github.com/norse/norse.
| fishmaster wrote:
| There's also Nengo (https://www.nengo.ai/).
| xingyzt wrote:
| I'm not very familiar with deep learning. How does this compare
| to the biomimicking Spike Time Dependent Plasticity of spiking
| neural networks?
|
| https://github.com/Shikhargupta/Spiking-Neural-Network#train...
| ericjang wrote:
| Deep Learning has nothing to do with biophysical neuron
| simulation, even though there is a confusing overloading of the
| term "neural network". A good introduction to deep learning is
| this chapter: https://mlstory.org/deep.html.
|
| STDP falls under biophysical models of neuron simulation, where
| we try to faithfully reproduce biophysics of brain simulation
| (trivia: I started my undergrad in computational neuroscience
| and implemented STDP several times [1, 2, 3]). STDP is a
| learning mechanism, but it has not demonstrated the ability to
| learn as powerful models as DNN.
|
| [1] https://github.com/ericjang/pyN
|
| [2] https://github.com/ericjang/julia-NeuralNets
|
| [3] https://github.com/ericjang/NeuralNets
| Digitalis33 wrote:
| DeepMind, Hinton, et al are still convinced that the brain must
| be doing something like backprop.
|
| See Lillicrap address all common objections to backprop in the
| brain:
| https://www.youtube.com/watch?v=vbvl0k-aUiE&ab_channel=ELSCV...
|
| Also from their paper Backpropagation in the brain:
|
| "It is not clear in detail what role feedback connections play in
| cortical computations, so we cannot say that the cortex employs
| backprop-like learning. However, if feedback connections modulate
| spiking, and spiking determines the adaptation of synapse
| strengths, the information carried by the feedback connections
| must clearly influence learning!"
| scribu wrote:
| > You can indeed use backprop to train a separate learning rule
| superior to naive backprop.
|
| I used to dismiss the idea of an impending singularity. Now I'm
| not so sure.
|
| Hopefully AGIs will reach hard physical limits to self-
| improvement, before taking over the world.
| jjk166 wrote:
| Given the obvious benefits of increased intelligence, the fact
| that hominid brain size (and presumably computational power)
| plateaued for the past 300,000 or so years, and that no other
| species has developed superior intelligence (ie there are
| bigger but not more effective brains out there), does seem to
| indicate that we are at or close to a local maxima for
| biological intelligence. Presumably that's some threshold
| beyond which the gains from increased computing power lead to
| limited improvements in intelligence. Of course that's not to
| say it's a global maximum.
| semi-extrinsic wrote:
| That, or our brains have hit some biological equivalent of
| Moore's law ending for silicon computing. Maybe getting a
| doubling in brain size would require a quadrupling in brain
| energy consumption (and power dissipation) at our level.
| jjk166 wrote:
| Yeah that's what I mean by not scaling above a threshold.
| Our brains could be bigger (neanderthals had larger ones
| than us), but those bigger brains for whatever reason
| weren't "worth it."
| timlarshanson wrote:
| But, if your realistically-spiking, stateful, noisy biological
| neural network is non-differentiable (which, so far as I know, is
| true), then how are you going to propagate gradients back through
| it to update your ANN approximated learning rule?
|
| I suspect that given the small size of synapses the algorithmic
| complexity of learning rules (and there are several) is small.
| Hence, you can productively use evolutionary or genetic
| algorithms to perform this search/optimization. Which I think
| you'd have to due to the lack of gradients, or simply due to
| computational cost. Plenty of research going on in this field.
| (Heck, while you're at it, might as well perform similar search
| over wiring typologies & recapitulate our own evolution without
| having to deal with signaling cascades, transport of mRNA &
| protein along dendrites, metabolic limits, etc)
|
| Anyway, coming from a biological perspective: evolution is still
| more general than backprop, even if in some domains it's slower.
| ericjang wrote:
| This is a good question. I think many "biologically plausible"
| neural models are willing to make some approximations for the
| benefit of computational power (e.g. rate coding instead of
| spike coding, point neurons and synapses instead of a cable
| model). As for non-differentiable operations, I think one
| strategy might be to formulate it as a multi-agent
| communication problem (e.g. https://www.aaai.org/ocs/index.php/
| AAAI/AAAI18/paper/viewFil...), where gradients are obtained via
| a differentiable relaxation or using a score-function gradient
| estimator (e.g. REINFORCE)
| orbifold wrote:
| You can actually calculate exact gradients for spiking
| neurons using the adjoint method:
| https://arxiv.org/abs/2009.08378 (I'm the second author). In
| my PhD thesis I show how this can be extended to larger
| problems and more complicated and biologically plausible
| neuron models. I agree with the gist of your post though:
| Retrofitting back propagation (or the adjoint method for that
| matter) is the wrong approach. One should rather use these
| methods to optimise biologically plausible learning rules.
| The group of Wolfgang Maass has done exciting work in that
| direction (e.g. https://arxiv.org/abs/1803.09574, https://www
| .frontiersin.org/articles/10.3389/fnins.2019.0048...,
| https://igi-web.tugraz.at/PDF/256.pdf).
| notthemessiah wrote:
| The author of this piece calls Dynamic Programming "one of the
| top three achievements of Computer Science", however, it doesn't
| have much to do with computer science, as it's just a synonym for
| mathematical optimization, used seemingly exclusively for being
| "politically-correct" (avoiding the wrath and suspicion of
| managers) at RAND Corporation:
|
| > I spent the Fall quarter (of 1950) at RAND. My first task was
| to find a name for multistage decision processes. An interesting
| question is, "Where did the name, dynamic programming, come
| from?" The 1950s were not good years for mathematical research.
| We had a very interesting gentleman in Washington named Wilson.
| He was Secretary of Defense, and he actually had a pathological
| fear and hatred of the word "research". I'm not using the term
| lightly; I'm using it precisely. His face would suffuse, he would
| turn red, and he would get violent if people used the term
| research in his presence. You can imagine how he felt, then,
| about the term mathematical. The RAND Corporation was employed by
| the Air Force, and the Air Force had Wilson as its boss,
| essentially. Hence, I felt I had to do something to shield Wilson
| and the Air Force from the fact that I was really doing
| mathematics inside the RAND Corporation. What title, what name,
| could I choose? In the first place I was interested in planning,
| in decision making, in thinking. But planning, is not a good word
| for various reasons. I decided therefore to use the word
| "programming". I wanted to get across the idea that this was
| dynamic, this was multistage, this was time-varying. I thought,
| let's kill two birds with one stone. Let's take a word that has
| an absolutely precise meaning, namely dynamic, in the classical
| physical sense. It also has a very interesting property as an
| adjective, and that is it's impossible to use the word dynamic in
| a pejorative sense. Try thinking of some combination that will
| possibly give it a pejorative meaning. It's impossible. Thus, I
| thought dynamic programming was a good name. It was something not
| even a Congressman could object to. So I used it as an umbrella
| for my activities.
|
| https://en.wikipedia.org/wiki/Dynamic_programming#History
| ilaksh wrote:
| Predictive coding seems not only plausible but also potentially
| advantageous in some ways. Such as being inherently well-suited
| to generative perception.
| neatze wrote:
| Comparisons of various neural architectures.
|
| Deep Learning in Spiking Neural Networks:
| https://arxiv.org/pdf/1804.08150.pdf
| intrasight wrote:
| I was told ~30 years ago by a leading computer scientist in the
| NN field that biology has nothing to teach us in terms of
| implementation. I switched from CS to neuroscience anyway. I've
| wrestled with his statement ever since. I'll say that nothing
| I've seen since then has shown him wrong.
| ericjang wrote:
| OP here. I hope that's not the takeaway readers glean from my
| article - the point I was making was just that it doesn't make
| sense to shoehorn a biophysical learning mechanism into a DNN,
| rather we should use a DNN to find a biophysical learning
| mechanism.
|
| As to whether biophysical learning has anything to teach us is
| an entirely different question which I don't discuss in the
| post.
| xkcd-sucks wrote:
| Nobody understands the biology fully enough to drive a better
| ML implementation. Individual neurons are very complex and
| their interactions even more so
| taliesinb wrote:
| While the continual one-upmanship of ever more intricate
| biologically plausible learning rules is interesting to observe
| (and I played around at one point with a variant of the original
| feedback alignment), I think OP's alternative view is more
| plausible.
|
| Fwiw I am involved in an ongoing project that is investigating a
| biologically plausible model for generating connectomes (as
| neuroscientists like to call them). The connectome-generator
| happens (coincidentally) to be a neural network. But exactly as
| the OP points out, this "neural network" need not actually
| represent a biological brain -- in our case it's actually a
| _hypernetwork_ representing the process of gene expression, which
| in turn generates the biological network. Backprop is then
| applied to this hypernetwork as a (more efficient) proxy for
| evolution. In the most extreme case there need not be any
| learning at all at the level of an individual organism. You can
| see this as the ultimate end-point of so-called Baldwinian
| evolution, which is the hypothesized process whereby more and
| more of the statistics of a task are "pulled back" into
| genetically encoded priors over time.
|
| But for me the more interesting question is how to approach the
| information flow from tasks (or 'fitness') to brains to genes on
| successively longer time scales. Can that be done with
| information theory, or perhaps with some generalization of it? I
| also think it is a rich and interesting challenge to
| _parameterize_ learning rules in such a way that evolution (or
| even random search) can efficiently find good ones for rapid
| learning of specific kinds of task. My gut feeling is that
| biological intelligence has many components that are ultimately
| discrete computations, and we 'll discover that those are
| reachable by random search if we can just get the substrate
| right, and in fact this is how evolution has often done it --
| shades of Gould and Eldredge's "punctuated equilibrium".
|
| (if anyone is interested in discussing any of these things feel
| free to drop me an email)
___________________________________________________________________
(page generated 2021-02-15 23:00 UTC)