[HN Gopher] Training of Physical Neural Networks
       ___________________________________________________________________
        
       Training of Physical Neural Networks
        
       Author : Anon84
       Score  : 136 points
       Date   : 2024-07-10 13:13 UTC (1 days ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | UncleOxidant wrote:
       | So it sounds like these PNNs are essentially analog
       | implementations of neural nets? Seems like an odd choice of
       | naming to call them 'physical'.
        
         | tomxor wrote:
         | ANN is taken.
        
           | TheLoafOfBread wrote:
           | I mean LoRA was taken too before LoRA became a thing
        
             | tomxor wrote:
             | I don't mean globally, LoRA are at least in different
             | domains. Artificial Neural Networks and Physical Neural
             | Networks are both machine learning, discussion referring to
             | both is highly probable, and the former far more
             | established so it calling it an Analog Neural Network would
             | never last long.
        
               | TeMPOraL wrote:
               | You can't call analog NN an AnalNN either, as then the
               | "brain-gut axis" people will throw a fit.
        
           | agarwaen163 wrote:
           | Well, this is particularly frustrating because PNN is already
           | taken as well, by the (imo) much more novel idea of Physics
           | Informed Neural Networks (aka PINNs or PNNs).
           | 
           | Why this isn't called Hardware Neural Nets is beyond me.
        
         | pessimizer wrote:
         | Makes sense as opposed to "abstract." With the constant
         | encoding and decoding that has to be done when things are going
         | in an out of processors and storage (or sensors), digital
         | processes are always in some sense simulations.
        
       | ksd482 wrote:
       | _PNNs resemble neural networks, however at least part of the
       | system is analog rather than digital, meaning that part or all
       | the input /output data is encoded continuously in a physical
       | parameter, and the weights can also be physical, with the
       | ultimate goal of surpassing digital hardware in performance or
       | efficiency._
       | 
       | I am trying to understand what format does a node take in PNNs.
       | Is it a transistor? Or is it more complex than that? Or, is it a
       | combination of a few things such as analog signal and some other
       | sensors which work together to form a single node that looks like
       | the one we are all familiar with?
       | 
       | Can anyone please help me understand what exactly is "physical"
       | about PNNs?
        
         | sigmoid10 wrote:
         | It's just a general idea to implement the computation part of
         | neurons directly in hardware instead of software. For example
         | by calculating sums or products using voltages in circuits,
         | i.e. analog computing. The actual implementation is up to the
         | designer, who in turn will try to mimic a certain architecture.
        
         | eightysixfour wrote:
         | Here you go: https://arstechnica.com/science/2018/07/neural-
         | network-imple...
        
       | Shawnecy wrote:
       | My knowledge in this area is incredibly limited, but I figured
       | the paper would mention NanoWire Networks (NWNs) as an emerging
       | physical neural network[0].
       | 
       | Last year, researchers from the University of Sydney and UCLA
       | used NWNs to demonstrate online learning of handwritten digits
       | with an accuracy of 93%.
       | 
       | [0] = https://www.nature.com/articles/s41467-023-42470-5
        
         | programjames wrote:
         | That doesn't implement a trainable network on hardware, it's
         | just creating a "reservoir" of associations between the inputs.
        
         | orbifold wrote:
         | Classifying MNIST digits with 93% accuracy can also be
         | accomplished using a linear classifier. So it isn't clear to me
         | what the advantage would be.
        
       | tomxor wrote:
       | Last time I read about this the main practical difficulty was
       | model transferability.
       | 
       | The very thing that makes it so powerful and efficient is also
       | the thing that make it uncopiable, because sensitivity to tiny
       | physical differences in the devices inevitably gets encoded into
       | the model during training.
       | 
       | It seems intuitive this is an unavoidable, fundamental problem.
       | Maybe that scares away big tech, but I quite like the idea of
       | having invaluable, non-transferable, irreplaceable little
       | devices. Not so easily deprecated by technological advances,
       | flying in the face of consumerism, getting better with age,
       | making people want to hold onto things.
        
         | bongodongobob wrote:
         | Reminds me of the evolutionary FPGA experiment that was
         | dependent on magnetic flux or something. The same program
         | wouldn't work on a different FPGA.
        
           | cyberax wrote:
           | Here's the paper about it: https://www.researchgate.net/publi
           | cation/2737441_An_Evolved_...
           | 
           | And a more approachable article:
           | https://www.damninteresting.com/on-the-origin-of-circuits/
        
           | rusticpenn wrote:
           | What thy did was overfitting. We later found other ways of
           | getting around the issue.
        
           | actionfromafar wrote:
           | Would be interesting to hook up many FPGAs of the same model
           | and train all of the at once. Programs with differing outputs
           | on different individuals could be discarded. The program may
           | still not transfer to another batch of FPGAs but at least you
           | have a better chance of the working.
           | 
           | Another idea is to just train a whole bunch of them
           | individually, like putting your chips in school. :-D
        
         | trextrex wrote:
         | Well, the brain is a physical neural network, and evolution
         | seems to have figured out how to generate a (somewhat) copiable
         | model. I bet we could learn a trick or two from biology here.
        
           | tomxor wrote:
           | Some parts are copiable, but not the more abstract things
           | like the human intellect, for lack of a better word.
           | 
           | We are not even born with what you might consider basic
           | mental faculties, for example it might seem absurd, but we
           | have to learn to see... We are born with the "hardware" for
           | it, a visual cortex, an eye, all defined by our genes, but
           | it's actually trained from birth, there is even a feedback
           | loop that causes the retina to physically develop properly.
        
             | immibis wrote:
             | They raised some cats from birth in an environment with
             | only vertically-oriented edges, none horizontal. Those cats
             | could not see horizontally-oriented things.
             | https://computervisionblog.wordpress.com/2013/06/01/cats-
             | and...
             | 
             | Likewise, kittens with an eye patch over an eye in the same
             | time period remain blind in that eye forever.
        
               | tomxor wrote:
               | Wow, that's a horrific way of proving that theory.
        
               | BriggyDwiggs42 wrote:
               | Geez poor kitties, but that is interesting.
        
             | alexpotato wrote:
             | Another example:
             | 
             | Children who were "raised in the wild" or locked in a room
             | by themselves have shown to be incapable of learning full
             | human language.
             | 
             | The working theory is that our brains can only learn
             | certain skills at certain times of brain development/ages.
        
               | deepfriedchokes wrote:
               | We should also consider the effects of trauma on those
               | brains. If you've ever spent time around people with
               | extreme trauma they are very much in their own heads and
               | can't focus outside themselves long enough to focus
               | enough to learn anything. It definitely impacts
               | intellectual capacity. Humans are social animals and
               | anyone raised without proper socializing and intimacy and
               | nurturing will inevitably end up traumatized.
        
           | hansworst wrote:
           | The way the brain does it is by giving users a largely
           | untrained model that they themselves have to train over the
           | next 20 years for it to be of any use.
        
             | lynx23 wrote:
             | 20 years of training is not enough. Neuroscientists say 25.
             | According to my own experience, its more like 30.
        
               | dcuthbertson wrote:
               | In the end, it's a life-long process.
        
             | wbillingsley wrote:
             | Sometimes. Foals are born (almost) able to walk. There are
             | occasions where evolution baked the model into the genes.
        
             | salomonk_mur wrote:
             | It is extremely trained already. Everyone alive was born
             | with the ability for all their organs and bodily function
             | to work autonomously.
             | 
             | A ton of that is probably encoded elsewhere, but no doubt
             | the brain plays a huge part. And somehow, it's all
             | reconstructed for each new "device".
        
         | alexpotato wrote:
         | > Last time I read about this the main practical difficulty was
         | model transferability.
         | 
         | There is a great write up of this in this old blog post:
         | https://www.damninteresting.com/on-the-origin-of-circuits/
        
         | robertsdionne wrote:
         | This is "Mortal Computation" coined in Hinton's The Forward-
         | Forward Algorithm: Some Preliminary Investigations
         | https://arxiv.org/abs/2212.13345.
        
         | programjames wrote:
         | You can regularize the networks to make them transfer easier. I
         | can't remember the abstract's title off the top of my head
         | though.
        
         | dsabanin wrote:
         | Couldn't you still copy by training a new network on a new
         | device to have same outputs for the same inputs as the
         | original?
        
           | tomxor wrote:
           | Yes, but training is the most expensive part of ML, for
           | example GPT-3 is estimated to cost something like 1-4 million
           | USD.
           | 
           | With ANN you can do it one time and then clone the result for
           | negligible energy cost.
           | 
           | Maybe training a batch of PNNs in parallel could save some of
           | the energy cost, but I don't know how feasible that is
           | considering they could behave slightly differently during
           | training causing divergence... Now that sarcastic comment at
           | the bottom of this thread is starting to sound relevant
           | "Schools".
        
             | kmmlng wrote:
             | > Yes, but training is the most expensive part of ML, for
             | example GPT-3 is estimated to cost something like 1-4
             | million USD.
             | 
             | That entirely depends on how many inferences the model will
             | perform during its lifecycle. You can find different
             | estimates for the energy consumption of ChatGPT, but they
             | range from something like 500-1000 MWh a day. Assuming an
             | electricity price of $0.165 per kWh, that would put you at
             | roughly $80,000 to a $160,000 a day.
             | 
             | Even at the lower end of $80,000 a day, you'll reach your
             | $4 Million in just 50 days.
        
               | tomxor wrote:
               | That's not a proportional comparison, n simultaneous
               | users to 1 training. How many users across how many GPUs
               | is that 80k?
               | 
               | With PNN you would have to multiply n by 1-4 million,
               | training cost explodes.
        
             | l33tman wrote:
             | That's not true for the most well-known models. For example
             | Meta's LLAMA training and architecture was predicated on
             | the observation that training cost is a drop in the well
             | compared to the inference cost for a model's lifetime.
        
           | etiam wrote:
           | Distillation (as you may be aware).
           | https://arxiv.org/abs/1503.02531
           | 
           | Having to do that in each instance is still really cumbersome
           | for cheap mass deployment compared to just making a digital-
           | style exact copy, but then again I guess a main argument for
           | wanting these systems is that they'd be doing things
           | unachievable in practice on digital computers.
           | 
           | In some cases one might be able to distill to digital
           | arithmetic after the heavy parts of the optimization are
           | done, for replication, distribution, better access for
           | software analysis, etc.
        
         | CuriouslyC wrote:
         | This was the thing Geoff Hinton cited as a problem with analog
         | networks.
         | 
         | I think eventually we'll get to the point where we do a stage
         | of pretraining on noisy digital hardware to create a
         | transferrable network, then fine tune it on the analog system.
        
         | jegp wrote:
         | It's still possible to train a network that's aware of the
         | physics and then transfer that to physical devices. One
         | approach to this from the neuromorphic community (that's been
         | working on this for a long time) is called the Neuromorphic
         | Intermediate Representation (NIR) and already lets you transfer
         | models to several hardware platforms [1]. This is pretty cool
         | because we can use the same model across systems, similar to a
         | digital instruction set. Ofc, this doesn't fix the problem of
         | sensitivity. But biology fixed that with plasticity, so we can
         | probably learn to circumvent that.
         | 
         | [1]: https://github.com/neuromorphs/nir (disclaimer: I'm one of
         | the authors)
        
         | 6gvONxR4sf7o wrote:
         | If (somehow/waves hands) you could parallelize training, maybe
         | this would turn into an implicit regularization and be a
         | benefit, not a flaw. Then again, physical parallelizability
         | might be an infeasibly restrictive constraint?
        
       | craigmart wrote:
       | Schools?
        
       | programjames wrote:
       | > These methods are typically slow because the number of gradient
       | updates scales linearly with the number of learnable parameters
       | in the network, posing a significant challenge for scaling up.
       | 
       | This is a pretty big problem, though if you use information-
       | bottleneck training you can train each layer simultaneously.
        
       ___________________________________________________________________
       (page generated 2024-07-11 23:02 UTC)