[HN Gopher] Neural Network Diffusion
       ___________________________________________________________________
        
       Neural Network Diffusion
        
       Author : vagabund
       Score  : 83 points
       Date   : 2024-02-21 19:31 UTC (3 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | vagabund wrote:
       | Author thread:
       | https://twitter.com/liuzhuang1234/status/1760195922502312197
        
         | squigz wrote:
         | Is there any sites for viewing Twitter threads without signing
         | up?
        
           | f_devd wrote:
           | https://nitter.esmailelbob.xyz/liuzhuang1234/status/17601959.
           | ..
           | 
           | (bit of trial and error from
           | https://github.com/zedeus/nitter/wiki/Instances)
        
       | falcor84 wrote:
       | Seems like we're getting very close to recursive self-improvement
       | [0].
       | 
       | [0] https://www.lesswrong.com/tag/recursive-self-improvement
        
         | mattnewton wrote:
         | I upvoted because this was my first thought too, but reading
         | the abstract and skimming the paper makes me think it's not
         | really an advance for general recursive improvement. I think
         | the title makes people think this is a text -> model model,
         | when it is really a bunch of model weights -> new model weights
         | optimizer for a specific architecture and problem. Still a
         | potentially very useful idea for learning from a bunch of
         | training runs and very interesting work!
        
           | fnordpiglet wrote:
           | I suspect this is useful for porting one vector space to
           | another which is an open problem when you've trained one
           | model with one architecture and need to port it to another
           | architecture without paying the full retraining cost.
        
         | GuB-42 wrote:
         | Doesn't look that different from what we are already doing. For
         | example AlphaGo/AlphaZero/MuZero learn to play board games by
         | playing repeatedly against itself, it is a self improvement
         | loop leading to superhuman play. It was a major breakthrough
         | for the game of Go, and it lead to advances in the field of
         | machine learning, but we are still far from something
         | resembling technological singularity.
         | 
         | GANs are another example of self-improvement. It was famous for
         | creating "deep fakes". It works by pitting a fake generator and
         | a fake detector against each other, resulting in a cycle of
         | improvement. It didn't get much further than that, in fact, it
         | is all about attention and transformers now.
         | 
         | This is just a way of optimizing parameters, it will not invent
         | new techniques. It can say "put 1000 neurons there, 2000 there,
         | etc...", but it still has to pick from what designers tell it
         | to pick from. It may adjust these parameters better than a
         | human can, leading to more efficient systems, I expect some
         | improvement to existing systems, but not a breaking change.
        
         | bamboozled wrote:
         | The ai is ready to take off to perfection land
        
       | amelius wrote:
       | Why does Figure 7 not include a validation curve (afaict only the
       | training curve is shown)?
        
       | Scene_Cast2 wrote:
       | Yay, an alternative to backprop & SGD! Really interesting and
       | impressive finding, I was surprised that the network generalizes.
        
       | goggy_googy wrote:
       | Important to note, they say "From these generated models, we
       | select the one with the best performance on the training set."
       | Definitely potential for bias here.
        
       | goggy_googy wrote:
       | "We synthesize 100 novel parameters by feeding random noise into
       | the latent diffusion model and the trained decoder." Cool that
       | patterns exist at this level, but also, 100 params means we have
       | a long way to go before this process is efficient enough to
       | synthesize more modern-sized models.
        
       ___________________________________________________________________
       (page generated 2024-02-21 23:00 UTC)