[HN Gopher] Training of Physical Neural Networks
___________________________________________________________________
Training of Physical Neural Networks
Author : Anon84
Score : 136 points
Date : 2024-07-10 13:13 UTC (1 days ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| UncleOxidant wrote:
| So it sounds like these PNNs are essentially analog
| implementations of neural nets? Seems like an odd choice of
| naming to call them 'physical'.
| tomxor wrote:
| ANN is taken.
| TheLoafOfBread wrote:
| I mean LoRA was taken too before LoRA became a thing
| tomxor wrote:
| I don't mean globally, LoRA are at least in different
| domains. Artificial Neural Networks and Physical Neural
| Networks are both machine learning, discussion referring to
| both is highly probable, and the former far more
| established so it calling it an Analog Neural Network would
| never last long.
| TeMPOraL wrote:
| You can't call analog NN an AnalNN either, as then the
| "brain-gut axis" people will throw a fit.
| agarwaen163 wrote:
| Well, this is particularly frustrating because PNN is already
| taken as well, by the (imo) much more novel idea of Physics
| Informed Neural Networks (aka PINNs or PNNs).
|
| Why this isn't called Hardware Neural Nets is beyond me.
| pessimizer wrote:
| Makes sense as opposed to "abstract." With the constant
| encoding and decoding that has to be done when things are going
| in an out of processors and storage (or sensors), digital
| processes are always in some sense simulations.
| ksd482 wrote:
| _PNNs resemble neural networks, however at least part of the
| system is analog rather than digital, meaning that part or all
| the input /output data is encoded continuously in a physical
| parameter, and the weights can also be physical, with the
| ultimate goal of surpassing digital hardware in performance or
| efficiency._
|
| I am trying to understand what format does a node take in PNNs.
| Is it a transistor? Or is it more complex than that? Or, is it a
| combination of a few things such as analog signal and some other
| sensors which work together to form a single node that looks like
| the one we are all familiar with?
|
| Can anyone please help me understand what exactly is "physical"
| about PNNs?
| sigmoid10 wrote:
| It's just a general idea to implement the computation part of
| neurons directly in hardware instead of software. For example
| by calculating sums or products using voltages in circuits,
| i.e. analog computing. The actual implementation is up to the
| designer, who in turn will try to mimic a certain architecture.
| eightysixfour wrote:
| Here you go: https://arstechnica.com/science/2018/07/neural-
| network-imple...
| Shawnecy wrote:
| My knowledge in this area is incredibly limited, but I figured
| the paper would mention NanoWire Networks (NWNs) as an emerging
| physical neural network[0].
|
| Last year, researchers from the University of Sydney and UCLA
| used NWNs to demonstrate online learning of handwritten digits
| with an accuracy of 93%.
|
| [0] = https://www.nature.com/articles/s41467-023-42470-5
| programjames wrote:
| That doesn't implement a trainable network on hardware, it's
| just creating a "reservoir" of associations between the inputs.
| orbifold wrote:
| Classifying MNIST digits with 93% accuracy can also be
| accomplished using a linear classifier. So it isn't clear to me
| what the advantage would be.
| tomxor wrote:
| Last time I read about this the main practical difficulty was
| model transferability.
|
| The very thing that makes it so powerful and efficient is also
| the thing that make it uncopiable, because sensitivity to tiny
| physical differences in the devices inevitably gets encoded into
| the model during training.
|
| It seems intuitive this is an unavoidable, fundamental problem.
| Maybe that scares away big tech, but I quite like the idea of
| having invaluable, non-transferable, irreplaceable little
| devices. Not so easily deprecated by technological advances,
| flying in the face of consumerism, getting better with age,
| making people want to hold onto things.
| bongodongobob wrote:
| Reminds me of the evolutionary FPGA experiment that was
| dependent on magnetic flux or something. The same program
| wouldn't work on a different FPGA.
| cyberax wrote:
| Here's the paper about it: https://www.researchgate.net/publi
| cation/2737441_An_Evolved_...
|
| And a more approachable article:
| https://www.damninteresting.com/on-the-origin-of-circuits/
| rusticpenn wrote:
| What thy did was overfitting. We later found other ways of
| getting around the issue.
| actionfromafar wrote:
| Would be interesting to hook up many FPGAs of the same model
| and train all of the at once. Programs with differing outputs
| on different individuals could be discarded. The program may
| still not transfer to another batch of FPGAs but at least you
| have a better chance of the working.
|
| Another idea is to just train a whole bunch of them
| individually, like putting your chips in school. :-D
| trextrex wrote:
| Well, the brain is a physical neural network, and evolution
| seems to have figured out how to generate a (somewhat) copiable
| model. I bet we could learn a trick or two from biology here.
| tomxor wrote:
| Some parts are copiable, but not the more abstract things
| like the human intellect, for lack of a better word.
|
| We are not even born with what you might consider basic
| mental faculties, for example it might seem absurd, but we
| have to learn to see... We are born with the "hardware" for
| it, a visual cortex, an eye, all defined by our genes, but
| it's actually trained from birth, there is even a feedback
| loop that causes the retina to physically develop properly.
| immibis wrote:
| They raised some cats from birth in an environment with
| only vertically-oriented edges, none horizontal. Those cats
| could not see horizontally-oriented things.
| https://computervisionblog.wordpress.com/2013/06/01/cats-
| and...
|
| Likewise, kittens with an eye patch over an eye in the same
| time period remain blind in that eye forever.
| tomxor wrote:
| Wow, that's a horrific way of proving that theory.
| BriggyDwiggs42 wrote:
| Geez poor kitties, but that is interesting.
| alexpotato wrote:
| Another example:
|
| Children who were "raised in the wild" or locked in a room
| by themselves have shown to be incapable of learning full
| human language.
|
| The working theory is that our brains can only learn
| certain skills at certain times of brain development/ages.
| deepfriedchokes wrote:
| We should also consider the effects of trauma on those
| brains. If you've ever spent time around people with
| extreme trauma they are very much in their own heads and
| can't focus outside themselves long enough to focus
| enough to learn anything. It definitely impacts
| intellectual capacity. Humans are social animals and
| anyone raised without proper socializing and intimacy and
| nurturing will inevitably end up traumatized.
| hansworst wrote:
| The way the brain does it is by giving users a largely
| untrained model that they themselves have to train over the
| next 20 years for it to be of any use.
| lynx23 wrote:
| 20 years of training is not enough. Neuroscientists say 25.
| According to my own experience, its more like 30.
| dcuthbertson wrote:
| In the end, it's a life-long process.
| wbillingsley wrote:
| Sometimes. Foals are born (almost) able to walk. There are
| occasions where evolution baked the model into the genes.
| salomonk_mur wrote:
| It is extremely trained already. Everyone alive was born
| with the ability for all their organs and bodily function
| to work autonomously.
|
| A ton of that is probably encoded elsewhere, but no doubt
| the brain plays a huge part. And somehow, it's all
| reconstructed for each new "device".
| alexpotato wrote:
| > Last time I read about this the main practical difficulty was
| model transferability.
|
| There is a great write up of this in this old blog post:
| https://www.damninteresting.com/on-the-origin-of-circuits/
| robertsdionne wrote:
| This is "Mortal Computation" coined in Hinton's The Forward-
| Forward Algorithm: Some Preliminary Investigations
| https://arxiv.org/abs/2212.13345.
| programjames wrote:
| You can regularize the networks to make them transfer easier. I
| can't remember the abstract's title off the top of my head
| though.
| dsabanin wrote:
| Couldn't you still copy by training a new network on a new
| device to have same outputs for the same inputs as the
| original?
| tomxor wrote:
| Yes, but training is the most expensive part of ML, for
| example GPT-3 is estimated to cost something like 1-4 million
| USD.
|
| With ANN you can do it one time and then clone the result for
| negligible energy cost.
|
| Maybe training a batch of PNNs in parallel could save some of
| the energy cost, but I don't know how feasible that is
| considering they could behave slightly differently during
| training causing divergence... Now that sarcastic comment at
| the bottom of this thread is starting to sound relevant
| "Schools".
| kmmlng wrote:
| > Yes, but training is the most expensive part of ML, for
| example GPT-3 is estimated to cost something like 1-4
| million USD.
|
| That entirely depends on how many inferences the model will
| perform during its lifecycle. You can find different
| estimates for the energy consumption of ChatGPT, but they
| range from something like 500-1000 MWh a day. Assuming an
| electricity price of $0.165 per kWh, that would put you at
| roughly $80,000 to a $160,000 a day.
|
| Even at the lower end of $80,000 a day, you'll reach your
| $4 Million in just 50 days.
| tomxor wrote:
| That's not a proportional comparison, n simultaneous
| users to 1 training. How many users across how many GPUs
| is that 80k?
|
| With PNN you would have to multiply n by 1-4 million,
| training cost explodes.
| l33tman wrote:
| That's not true for the most well-known models. For example
| Meta's LLAMA training and architecture was predicated on
| the observation that training cost is a drop in the well
| compared to the inference cost for a model's lifetime.
| etiam wrote:
| Distillation (as you may be aware).
| https://arxiv.org/abs/1503.02531
|
| Having to do that in each instance is still really cumbersome
| for cheap mass deployment compared to just making a digital-
| style exact copy, but then again I guess a main argument for
| wanting these systems is that they'd be doing things
| unachievable in practice on digital computers.
|
| In some cases one might be able to distill to digital
| arithmetic after the heavy parts of the optimization are
| done, for replication, distribution, better access for
| software analysis, etc.
| CuriouslyC wrote:
| This was the thing Geoff Hinton cited as a problem with analog
| networks.
|
| I think eventually we'll get to the point where we do a stage
| of pretraining on noisy digital hardware to create a
| transferrable network, then fine tune it on the analog system.
| jegp wrote:
| It's still possible to train a network that's aware of the
| physics and then transfer that to physical devices. One
| approach to this from the neuromorphic community (that's been
| working on this for a long time) is called the Neuromorphic
| Intermediate Representation (NIR) and already lets you transfer
| models to several hardware platforms [1]. This is pretty cool
| because we can use the same model across systems, similar to a
| digital instruction set. Ofc, this doesn't fix the problem of
| sensitivity. But biology fixed that with plasticity, so we can
| probably learn to circumvent that.
|
| [1]: https://github.com/neuromorphs/nir (disclaimer: I'm one of
| the authors)
| 6gvONxR4sf7o wrote:
| If (somehow/waves hands) you could parallelize training, maybe
| this would turn into an implicit regularization and be a
| benefit, not a flaw. Then again, physical parallelizability
| might be an infeasibly restrictive constraint?
| craigmart wrote:
| Schools?
| programjames wrote:
| > These methods are typically slow because the number of gradient
| updates scales linearly with the number of learnable parameters
| in the network, posing a significant challenge for scaling up.
|
| This is a pretty big problem, though if you use information-
| bottleneck training you can train each layer simultaneously.
___________________________________________________________________
(page generated 2024-07-11 23:02 UTC)