[HN Gopher] A modern self-referential weight matrix that learns ...
___________________________________________________________________
A modern self-referential weight matrix that learns to modify
itself
Author : lnyan
Score : 113 points
Date : 2022-04-13 16:12 UTC (6 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| savant_penguin wrote:
| Just skimmed the paper but the benchmarks are super weird
| ricardobayes wrote:
| It's a super weird feeling to click on a hacker news top post and
| find out I know one of the authors. The world is a super small
| place.
| heyitsguay wrote:
| I know Schmidhuber is famously miffed for missing out on the AI
| revolution limelight, and despite that he runs a pretty famous
| and well-resourced group. So with a paper like this demonstrating
| a new fundamental technique, you'd think they would eat the labor
| and compute costs of getting this up and running on a full
| gauntlet of high-profile benchmarks, in comparison with existing
| SOTA methods, vs the sort of half-hearted benchmarking that
| happens in this paper. It's a hassle, but all it would take for
| something like this to catch the community's attention would be a
| clear demonstration of viability in line with what groups at any
| of the other large research institutions do.
|
| The failure to put something like that front and center makes me
| wonder how strong the method is, because you have to assume that
| someone on the team has tried more benchmarks. Still, the idea of
| learning a better update rule than gradient descent is
| intriguing, so maybe something cool will come from this :)
| nullc wrote:
| Or they hurried the publication to avoid getting scooped and
| will follow up with interesting benchmarks later.
| mark_l_watson wrote:
| I haven't really absorbed this paper yet, but first thoughts were
| Hopfield Networks we used in the 1980s.
|
| For unsupervised learning algorithms like masked models (BERT and
| some other Transformers), it makes sense to train in parallel
| with prediction. Why not?
|
| My imagination can't wrap around using this for supervised
| (labeled data) learning.
| jdeaton wrote:
| I'm having a hard time reading this paper without hearing you-
| again's voice in my head.
| TekMol wrote:
| I have been playing with alternative ways to do machine learning
| on and off for a few years now. Some experiments went very well.
|
| I am never sure if it is a waste of time or has some value.
|
| If you guys had some unique ML technology that is different to
| what all the others do, what would you do with it?
| hwers wrote:
| Write a paper about it. Post it on arxiv.org. Contact some open
| minded researchers on twitter or here (show HN) for critique.
| javajosh wrote:
| Host it on a $5 VPS with full internet access and "see what
| happens".
| ggerganov wrote:
| I would make a "Show HN" post
| [deleted]
| swagasaurus-rex wrote:
| Create a demo of it doing -something-. Literally anything. Then
| show it off and see where it goes.
| daveguy wrote:
| Demo speaks louder than words. If you don't want to go into the
| details of how it works, it would still be interesting to just
| see where it over and under performs compared to existing
| systems.
| mark_l_watson wrote:
| Absolutely! Also, if possible, a Colab (or plain Jupiter
| notebook) and data would be good.
| nynx wrote:
| I'd make a blog and post about my experiments.
| andai wrote:
| And a video too, please :)
| Scene_Cast2 wrote:
| If you do end up posting any sort of musings on this topic, I'd
| be really interested in taking a look.
| drewm1980 wrote:
| Start with the assumption that someone has already done it...
| Do a thorough literature survey... Ask experts working on the
| most similar thing. Don't be disheartened if you weren't the
| first; ideas don't have to be original to have value; some
| ideas need reviving from time to time, or were ahead of their
| time when first discovered.
| codelord wrote:
| I haven't read the paper yet, no comment on the content. But it's
| amusing that more than 30% of references are self-citation.
| lol1lol wrote:
| Hinton et al. self cite. Schmidhuber et al. self cite. One got
| Turing, the other got angry.
| nh23423fefe wrote:
| It's only a matter of time until the technological singularity
|
| > The WM of a self-referential NN, however, can keep rapidly
| modifying all of itself during runtime. In principle, such NNs
| can meta-learn to learn, and metameta-learn to meta-learn to
| learn, and so on, in the sense of recursive self-improvement.
|
| Everyone who doubts is hanging everything on "in principle" being
| too hard. Seems ridiculous to me, a failure of imagination.
| Tr3nton wrote:
| There's a saying that goes, as soon as we can build it, it's
| not AI any more
| godelski wrote:
| I think there's some context missing when we're talking about
| the singularity, this is the whole Marcus "AI is hitting a
| wall" debate (maybe I'm reading part of that reference in your
| comment and it isn't there). For different people "hitting the
| wall" means different things and we're not really communicating
| well with one another. Marcus is concerned about AGI while
| others are concerned about ML in general and what it can do. So
| LLMs and LGMs (like Dall-E) are showing massive improvements
| and seem to be counter to Marcus's claim. But from the other
| side, there's still issues with solving AGI with things like
| causal learning and symbolic learning. But what's bugged me a
| bit about Marcus's claim is that those areas are also rapidly
| improving. I just think it is silly to say that Dall-E is a
| proof that Marcus is wrong rather than pointing towards our
| improvements in causal learning. But I guess few are interested
| in CL and it isn't nearly as flashy. I know Marcus reads HN, so
| maybe you don't think we've been making enough strides in
| CL/SL? I can agree that it doesn't get enough attention, but ML
| is very hyped at this point.
| synquid wrote:
| Schmidhuber has written about recursive self-improvement since
| his diploma thesis in the 80s: "Evolutionary principles in
| self-referential learning, or on learning how to learn: The
| meta-meta-... hook".
|
| Your quote sounds like it could just as well have been from
| that thesis.
| [deleted]
| pizza wrote:
| Singularity is not that important. Scale and locality is.
| Information that has to travel across the world suffers from
| misordering/relativity. Same for across the room or across a
| single wire that is nondeterministically but carelessly left
| unplugged. An oracle doesn't help in that case. Instead what
| you want is a new kind of being.
| jmmcd wrote:
| "In principle" is not only (1) hard in practice but also even
| in principle it is (2) limited by the capacity of the NN. It's
| (2) which gives me some reassurance.
| _Microft wrote:
| _" It suddenly stopped self-improving."_ "What happened?" _" It
| ... it looks like it found a way to autogenerate content that
| is optimized for its reward function and now binges on it
| 24/7..."_ ><
| erdos4d wrote:
| Even if the algorithms were here to do that job, where will the
| hardware come from? I'm staring at almost a decade of
| commercial processor (i7) lineage right now and the jump has
| been from 4 to 6 cores with no change in clock speed (maybe the
| newer one is even slower actually). There definitely won't be
| any singularity jumping off unless today's hardware gets
| another few decades of Moore's law, and that is not happening.
| dotnet00 wrote:
| It's a bit dishonest to be looking at specifically the
| manufacturer who spent most of the past decade enjoying its
| effective monopoly on desktop CPUs as a reference for how
| computers have improved. Even moreso since 4 and 6 core CPUs
| are not representative of the high end systems used to train
| even current state of the art ML models.
| armchair_ wrote:
| Well, the code that was published alongside the article is
| written in Python and CUDA, so you're not looking at the
| right kind of processor to start.
|
| My 5-year-old, consumer-grade GPU does 1.5 MHz * 2300 cores,
| whereas the equivalent released this year does 1.7 Mhz * 8900
| cores. Granted, not the best way to measure GPU performance,
| but it is roughly keeping pace with Moore's law, and it's
| going to be a better indicator of the future than Intel CPU
| capabilities, especially for machine learning applications.
| erdos4d wrote:
| So you are saying that we can get another 3 decades of
| Moore's law by switching to GPUs made of silicon rather
| than CPUs made of silicon? Well fuck, problem solved then.
| I was completely unaware it was so easy.
| VyperCard wrote:
| Yeah, but mostly for matrix multiplications
___________________________________________________________________
(page generated 2022-04-13 23:00 UTC)