hngopher.com

       [HN Gopher] My failed attempt at AGI on the Tokio Runtime
       ___________________________________________________________________
        
       My failed attempt at AGI on the Tokio Runtime
        
       Author : openquery
       Score  : 69 points
       Date   : 2024-12-26 16:22 UTC (6 hours ago)
        
 (HTM) web link (www.christo.sh)
 (TXT) w3m dump (www.christo.sh)
        
       | cglan wrote:
       | I've thought of something like this for a while, I'm very
       | interested in where this goes.
       | 
       | A highly async actor model is something I've wanted to explore,
       | and combined with a highly multi core architecture but clocked
       | very very low, it seems like it could be power efficient too.
       | 
       | I was considering using go + channels for this
        
         | openquery wrote:
         | Give it a shot. It isn't much code.
         | 
         | If you want to look at more serious work the Spiking Neural Net
         | community has made models which actually work and are power
         | efficient.
        
         | jerf wrote:
         | The idea has kicked around in hardware for a number of years,
         | such as: https://www.greenarraychips.com/home/about/index.php
         | 
         | I think the problem isn't that it's a "bad idea" in some
         | intrinsic sense, but that you really have to have a problem
         | that it fits like a glove. By the nature of the math, if you
         | can only use 4 of your 128 cores 50% of the time, your
         | performance just tanks no matter how fast you're going the
         | other 50% of the time.
         | 
         | Contra the occasional "Everyone Else Is Stupid And We Just Need
         | To Get Off Of von Neumann Architectures To Reach Nirvana" post,
         | CPUs are shaped the way they are for a reason; being able to
         | bring very highly concentrated power to bear on a specific
         | problem is very flexible, especially when you can move the
         | focus around very quickly as a CPU can. (Not instantaneously,
         | but quickly, and this switching penalty is something that can
         | be engineered around.) A lot of the rest of the problem space
         | has been eaten by GPUs. This sort of "lots of low powered
         | computers networked together" still fits in between them
         | somewhat, but there's not a lot of space left anymore. They can
         | communicate better in some ways than GPU cores can communicate
         | with each other, but that is also a problem that can be
         | engineered around.
         | 
         | If you squint really hard, it's possible that computers are
         | sort of wandering in this direction, though. Being low power
         | means it's also low-heat. Putting "efficiency cores" on to CPU
         | dies is sort of, kind of starting down a road that could end up
         | at the greenarray idea. Still, it's hard to imagine what even
         | all of the Windows OS would do with 128 efficiency cores. Maybe
         | if someone comes up with a brilliant innovation on current AI
         | architectures that requires some sort of additional cross-talk
         | between the neural layers that simply _requires_ this sort of
         | architecture to work you could see this pop up... which I
         | suppose brings us back around to the original idea. But it 's
         | hard to imagine what that architecture could be, where the
         | communication is vital on a nanosecond-by-nanosecond level and
         | can't just be a separate phase of processing a neural net.
        
           | openquery wrote:
           | > By the nature of the math, if you can only use 4 of your
           | 128 cores 50% of the time, your performance just tanks no
           | matter how fast you're going the other 50% of the time.
           | 
           | I'm not sure I understand this point. If you're using a work-
           | stealing threadpool servicing tasks in your actor model
           | there's no reason you shouldn't get ~100% CPU utilisation
           | provided you are driving the input hard enough (i.e. sampling
           | often from your inputs).
        
       | robblbobbl wrote:
       | Finally singularity confirmed, thanks.
        
       | dhruvdh wrote:
       | I wish more people would just try to do things just like this and
       | blog about their failures.
       | 
       | > The published version of a proof is always condensed. And even
       | if you take all the math that has been published in the history
       | of mankind, it's still small compared to what these models are
       | trained on.
       | 
       | > And people only publish the success stories. The data that are
       | really precious are from when someone tries something, and it
       | doesn't quite work, but they know how to fix it. But they only
       | publish the successful thing, not the process.
       | 
       | - Terence Tao (https://www.scientificamerican.com/article/ai-
       | will-become-ma...)
       | 
       | Personally, I think failures on their own are valuable. Others
       | can come in and branch off from a decision you made that instead
       | leads to success. Maybe the idea can be applied to a different
       | domain. Maybe your failure clarified something for someone.
        
         | openquery wrote:
         | Thank you for saying this. I agree which is why I wrote this
         | up.
        
       | markisus wrote:
       | > The only hope I have is to try something completely novel
       | 
       | I don't think this is true. Neural networks were not completely
       | novel when they started to work. Someone just used a novel piece
       | -- the gpu. Whatever the next thing is, it will probably be a
       | remix of preexisting components.
        
         | openquery wrote:
         | Right. Ironically I chose a model that was around in the 1970s
         | without knowing it.
         | 
         | My point was more a game-theoretic one. There is just no chance
         | I would beat the frontier labs if I tried the same things with
         | less compute and less people. (Of course there is almost 0
         | chance I would beat them at all.)
        
       | andsoitis wrote:
       | If you're looking for a neuroscience approach, check out Numenta
       | https://www.numenta.com/
        
       | namero999 wrote:
       | Isn't this self-refuting? From the article:
       | 
       | > Assume you are racing a Formula 1 car. You are in last place.
       | You are a worse driver in a worse car. If you follow the same
       | strategy as the cars in front of you, pit at the same time and
       | choose the same tires, you will certainly lose. The only chance
       | you have is to pick a different strategy.
       | 
       | So why model brains and neurons at all? You are outgunned by at
       | least 300.000 thousand years of evolution and 117 billion
       | training sessions.
        
         | andrewflnr wrote:
         | Because bio brains aren't even in the same race.
        
       | dudeinjapan wrote:
       | The greatest trick AGI ever pulled was convincing the world it
       | didn't exist.
        
         | homarp wrote:
         | using https://news.ycombinator.com/item?id=42324444 you could
         | make a better joke
         | 
         | Also I was wondering about the source of the original quote,
         | https://quoteinvestigator.com/2018/03/20/devil/
        
       | alecst wrote:
       | Love the drawings. Kind of a silly question, but how did you do
       | them?
        
         | openquery wrote:
         | Excalidraw[0] and a mouse and a few failed attempts :)
         | 
         | [0] https://excalidraw.com/
        
       | Onavo wrote:
       | > _Ok how the hell do we train this thing? Stochastic gradient
       | descent with back-propagation won 't work here (or if it does I
       | have no idea how to implement it)._
       | 
       | What's wrong with gradient descent?
       | 
       | https://snntorch.readthedocs.io/en/latest/
        
         | thrance wrote:
         | Gradient descent needs a differentiable system, the author's
         | clearly not.
        
         | openquery wrote:
         | Thanks for sharing. I thought the discontinuous nature of the
         | SNN made it non-differentiable and therefore unsuitable for SGD
         | and backprop.
        
       | henning wrote:
       | The author could first reproduce models and results from papers
       | before trying to extend that work. Starting with something
       | working helps.
        
       | skeledrew wrote:
       | Interesting. I started a somewhat conceptually similar project
       | several months ago. For me though, the main motivation is that I
       | think there's something fundamentally wrong with the current
       | method of using matrix math for weight calculation and
       | representation. I'm taking the approach that the very core of how
       | neurons work is inherently binary, and should remain that way. My
       | basic thesis is that it should reduce computational requirements,
       | and lead to something more generic. So I set out to build
       | something that takes an array of booleans (the upstream neurons
       | either fired or didn't fire at a particular time sequence), and
       | gives a single boolean calculated with a customizable activator
       | function.
       | 
       | Project is currently on ice as after I created something that
       | builds a network of layers, but ran into a wall figuring out how
       | to have that network wire itself over time and become
       | representative of whatever it's learned. I'll take some time and
       | go through this, see what it may spark and try to start working
       | on mine again.
        
         | openquery wrote:
         | Nice. Interested to see where this leads.
         | 
         | The network in the article doesn't have explicit layers. It's a
         | graph which is initialised with a completely random
         | connectivity matrix. The inputs and outputs are also wired
         | randomly in the beginning (an input could be connected to a
         | neuron which is also connected to an output for example, or the
         | input could be connected to a neuron which has no post-synaptic
         | neurons).
         | 
         | It was the job of the optimisation algorithm to figure out the
         | graph topology over training.
        
       ___________________________________________________________________
       (page generated 2024-12-26 23:01 UTC)