[HN Gopher] Neural Network Diffusion
___________________________________________________________________
Neural Network Diffusion
Author : vagabund
Score : 83 points
Date : 2024-02-21 19:31 UTC (3 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| vagabund wrote:
| Author thread:
| https://twitter.com/liuzhuang1234/status/1760195922502312197
| squigz wrote:
| Is there any sites for viewing Twitter threads without signing
| up?
| f_devd wrote:
| https://nitter.esmailelbob.xyz/liuzhuang1234/status/17601959.
| ..
|
| (bit of trial and error from
| https://github.com/zedeus/nitter/wiki/Instances)
| falcor84 wrote:
| Seems like we're getting very close to recursive self-improvement
| [0].
|
| [0] https://www.lesswrong.com/tag/recursive-self-improvement
| mattnewton wrote:
| I upvoted because this was my first thought too, but reading
| the abstract and skimming the paper makes me think it's not
| really an advance for general recursive improvement. I think
| the title makes people think this is a text -> model model,
| when it is really a bunch of model weights -> new model weights
| optimizer for a specific architecture and problem. Still a
| potentially very useful idea for learning from a bunch of
| training runs and very interesting work!
| fnordpiglet wrote:
| I suspect this is useful for porting one vector space to
| another which is an open problem when you've trained one
| model with one architecture and need to port it to another
| architecture without paying the full retraining cost.
| GuB-42 wrote:
| Doesn't look that different from what we are already doing. For
| example AlphaGo/AlphaZero/MuZero learn to play board games by
| playing repeatedly against itself, it is a self improvement
| loop leading to superhuman play. It was a major breakthrough
| for the game of Go, and it lead to advances in the field of
| machine learning, but we are still far from something
| resembling technological singularity.
|
| GANs are another example of self-improvement. It was famous for
| creating "deep fakes". It works by pitting a fake generator and
| a fake detector against each other, resulting in a cycle of
| improvement. It didn't get much further than that, in fact, it
| is all about attention and transformers now.
|
| This is just a way of optimizing parameters, it will not invent
| new techniques. It can say "put 1000 neurons there, 2000 there,
| etc...", but it still has to pick from what designers tell it
| to pick from. It may adjust these parameters better than a
| human can, leading to more efficient systems, I expect some
| improvement to existing systems, but not a breaking change.
| bamboozled wrote:
| The ai is ready to take off to perfection land
| amelius wrote:
| Why does Figure 7 not include a validation curve (afaict only the
| training curve is shown)?
| Scene_Cast2 wrote:
| Yay, an alternative to backprop & SGD! Really interesting and
| impressive finding, I was surprised that the network generalizes.
| goggy_googy wrote:
| Important to note, they say "From these generated models, we
| select the one with the best performance on the training set."
| Definitely potential for bias here.
| goggy_googy wrote:
| "We synthesize 100 novel parameters by feeding random noise into
| the latent diffusion model and the trained decoder." Cool that
| patterns exist at this level, but also, 100 params means we have
| a long way to go before this process is efficient enough to
| synthesize more modern-sized models.
___________________________________________________________________
(page generated 2024-02-21 23:00 UTC)