[HN Gopher] Neuroevolution of augmenting topologies (NEAT algori...
       ___________________________________________________________________
        
       Neuroevolution of augmenting topologies (NEAT algorithm)
        
       Author : elsewhen
       Score  : 73 points
       Date   : 2024-12-02 10:12 UTC (2 days ago)
        
 (HTM) web link (en.wikipedia.org)
 (TXT) w3m dump (en.wikipedia.org)
        
       | Rhapso wrote:
       | I post a link to NEAT here about once a week.
       | 
       | The big problem is that NEAT can't leverage a GPU effectively at
       | scale (arbitrary topologies vs bipartite graphs)
       | 
       | Other than that, it feels like a "road not taken" of machine
       | learning. It handles building complex agent behaviors really
       | well, and pressure to minimal topology results in understandable
       | and reverse interpretable networks.
       | 
       | Its easier to learn and implement than back propagation, the
       | speciation is the most complex but awesome feature. It develops
       | multiple solutions, clustering them to iterate on them
       | separately. An online learning NEAT agent is in practice is an
       | online collection of behaviors, adapting and swapping dominance
       | as their fitness changes.
        
         | Rhapso wrote:
         | I think the major iteration that will come next is "NEAT in a
         | dream" where a robot or agent online-trains a library of
         | behaviors on an model of the environment constantly updated
         | with the new experiences generated by the dominate behaviors
         | interacting with reality.
        
         | bob1029 wrote:
         | > The big problem is that NEAT can't leverage a GPU effectively
         | at scale
         | 
         | I worry how much experimentation we are leaving on the table
         | simply because these paths wouldn't implicate nvidia
         | stockholders in an advantageous manner.
         | 
         | Modern CPUs might as well be GPUs when you are running GA
         | experiments from the 80s and 90s on them. We can do millions of
         | generations per hour and populations that span many racks of
         | machines with the technology on hand right now.
        
           | api wrote:
           | The more immediate causal reason is that there is no off the
           | shelf high performance hardware to accelerate these paths.
           | GPUs were made for graphics, not machine learning. They just
           | happened to be really good at running certain kinds of models
           | that can be reduced to a ton of matrix math that GPUs do
           | really really fast.
           | 
           | Something like Intel Xeon Phi or one of the many-many-core
           | ARM designs I've heard talked about for years would be better
           | for more open ended research in this field. You want loads
           | and loads of simple general purpose cores with local RAM. Put
           | like 4096 32-bit ARM or RISC-V cores with super fast local
           | RAM on a die and transactional or DMA-like access to larger
           | main memory, and make it so these chips can be cross linked
           | to form even larger clusters the way nVidia cards can. This
           | is the kind of hardware you'd want.
           | 
           | When I was in college in the early 2000s I played around with
           | genetic algorithms and artificial life simulations on a
           | cluster of IBM Cell Broadband Engine processors at the
           | University of Cincinnati. That CPU was an early hybrid many-
           | core design where you had one PowerPC-based controller and a
           | bunch of simplified specialized cores like a GPU but more
           | designed for general purpose compute. Programming it was
           | hairy but you could get great performance for the time on a
           | lot of things.
           | 
           | Computing is unfortunately full of probably better roads not
           | taken for circumstantial reasons. JavaScript was originally
           | supposed to be a functional language-- a real, cleaner one.
           | There were much, much better OSes around than Unix in the 80s
           | and 90s but they were proprietary or hardware-specific. We
           | are stuck with IPv4 for a long time because IPv6 came too
           | late, and if they'd just added say 1-2 more octets to V4 then
           | V6 would not be necessary. I could go on for a while.
           | 
           | This has led to a "worse is better" view that I think might
           | just be a rationalization for the tyranny of path-dependent
           | effects and lock-in after deployment.
        
             | RaftPeople wrote:
             | > _When I was in college in the early 2000s I played around
             | with genetic algorithms and artificial life simulations on
             | a cluster of IBM Cell Broadband Engine processors_
             | 
             | Early/Mid/Late 2000's I was working on a hobby project for
             | artificial life with ga evolved brains. I spent a lot of
             | time investigating options, like the cell processor (I
             | bought a playstation to test with). I also looked at
             | interesting multi-core cpu's that were being introduced.
             | 
             | I ended up with a combo of CPU+GPU on networked PC's, which
             | was better than nothing but not ideal.
        
               | bob1029 wrote:
               | > I ended up with a combo of CPU+GPU on networked PC's,
               | which was better than nothing but not ideal.
               | 
               | The biggest constraint I've seen with scaling up these
               | simulations is maintaining coherence of population
               | dynamics across all of the processing units. The less
               | often population members are exchanged, the more likely
               | you will wind up with a decoupled population and stuck in
               | a ditch somewhere. Since modern CPUs can go so damn fast,
               | you need to exchange members quite frequently. Memory
               | bandwidth and latency are the real troublemakers.
               | 
               | You can spread a simulation across many networked nodes,
               | but then your effective cycle time is potentially
               | millions of times greater than if you keep it all in one
               | socket. I think the newest multi-die CPUs hit the perfect
               | sweet spot for these techniques (~100ns latency domain).
        
         | tomrod wrote:
         | I feel a bit embarrassed as, despite almost 15 years working in
         | the field, I've not played with NEAT yet nor read up well on
         | it. TIME TO CHANGE THAT :)
        
         | epr wrote:
         | > The big problem is that NEAT can't leverage a GPU effectively
         | at scale (arbitrary topologies vs bipartite graphs)
         | 
         | Is that true? These graphs can be transformed into a regular
         | tensor shape with zero weights on unused connections. If you
         | were worried about too much time/space used by these zero
         | weights, you could introduce parsimony pressure related to the
         | dimensions of transformed tensors rather than on the bipartite
         | graph.
        
           | Y_Y wrote:
           | Or even use CuSparse, if you don't mind a little bit of extra
           | work over normal cudnn.
           | 
           | https://developer.nvidia.com/cusparse
        
           | PartiallyTyped wrote:
           | We can also show that sparse NNs under some conditions are
           | ensembles of discrete subnets, and the authors of the
           | original dropout paper argue that [dropout] effectively
           | creates something akin to a forest of subnets all in
           | "superposition".
        
           | HelloNurse wrote:
           | Another plausible strategy to neutralize arbitrary
           | topologies: compile individual solutions or groups of similar
           | solutions into big compute shaders that execute the network
           | and evaluate expensive fitness functions, with parallel
           | execution over multiple test cases (aggregated in
           | postprocessing) and/or over different numerical parameters
           | for the same topology.
        
           | Rhapso wrote:
           | Just because you can pack the topology into a sparse matrix
           | doesn't make it actually go faster.
           | 
           | Sparse matrices often don't see good speedup from GPUs.
           | 
           | In addition, each network is unique, each neuron can have an
           | entirely different activation function, and the topology is
           | constantly changing. You will burn a lot on constantly re-
           | packing into matrices that then don't see the same speedups a
           | more wasteful topology pretends to have.
           | 
           | On the flip-side out narrative of "speedup" is on bipartite
           | graphs crunch faster in gpus and it might not be the same if
           | the basis is actually utility of behaviors generated by the
           | networks. A cousin thread explores this better.
        
         | nurettin wrote:
         | I never understood the appeal of NEAT. It's easy to conceive of
         | mutation operators on fully connected layers instead of graphs
         | of neurons and evaluate a lot faster. NEAT also seems to have
         | at least a dozen hyperparameters.
        
           | Rhapso wrote:
           | It mostly manages its hyperparameters itself.
           | 
           | There is a reward function, one of the hyperparameters we do
           | have to set, for condensing functionality into topology
           | instead of smearing it obscurely across laters. You can just
           | look at a NEAT network analytically and know what is going on
           | there.
        
       | sakex wrote:
       | Shameless plug:
       | 
       | I have an implementation in Rust (CPU only) with added GRU gates
       | 
       | https://crates.io/crates/neat-gru
        
         | retrofrost wrote:
         | Damn thats awesome, might be the thing that really gets me to
         | mess around with Rust.
        
       | retrofrost wrote:
       | There's a lot of really interesting work in neuroevolution that
       | has the potential to make some really interesting unsupervised
       | training regimes. I think theres some really interesting
       | possibilities for unique encoding schemes like ACE encoding to
       | speed up training and provide much smarter behavior out the other
       | end. Especially, if "genes" can form reusable elements of neural
       | topology that makes scaling networks faster. Reusing components
       | all over body is how we fit such complexity in the relatively
       | little unique DNA we have. The other interesting thing about
       | using genetic algorithms for a portion of training/network
       | mapping is that allows you to have heterogenous networks, so you
       | can have simulations or representations of astrocyte/glial
       | behaivor easily get integrated with neural networks. With
       | traditional training methods it's a massive fucking pain to train
       | a non-feed forward network.
       | 
       | I do think that languages like Elixir and other cpu concurrent
       | strong tools can really be leveraged to make some dynamite
       | libraries.
        
       | sgolem wrote:
       | I've written a NEAT implementation in Rust a few years back. It
       | was able to train XOR and pole balancing.
       | 
       | https://sgolem.com/blog/neat-simple-neuroevolution-framework...
        
       | publicdaniel wrote:
       | For anyone interested in NEAT, you'll likely enjoy Ken's later
       | work on novelty search, open-endedness, etc.
       | 
       | His book "Why Greatness Cannot be Planned: The Myth of the
       | Objectives" is one of the most perspective altering works I have
       | ever read.
       | 
       | Very excited to see what he does next. He mentioned on Twitter a
       | couple times his interest in representation learning and how
       | objective based search affects this. Very interesting stuff
        
       | supernikio2 wrote:
       | Legendary Sethbling video from 2015 where he implemented NEAT for
       | SMW: https://www.youtube.com/watch?v=qv6UVOQ0F44&t=1
        
         | ThouYS wrote:
         | That video inspired me to read the NEAT paper. I think that was
         | the first scientific paper I ever printed out and read
        
         | notpublic wrote:
         | Thanks for that link!
         | 
         | There is another video posted 7 months ago from Seth where he
         | interviews Ken Stanley, the creator of the NEAT algorithm.
         | 
         | https://www.youtube.com/watch?v=5zg_5hg8Ydo
        
       | westurner wrote:
       | NEAT:
       | https://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_t...
       | 
       | NEAT > See also links to EANT.
       | 
       | EANT:
       | https://en.wikipedia.org/wiki/Evolutionary_acquisition_of_ne... :
       | 
       | > _Evolutionary acquisition of neural topologies (EANT /EANT2) is
       | an evolutionary reinforcement learning method that evolves both
       | the topology and weights of artificial neural networks. It is
       | closely related to the works of Angeline et al. [1] and Stanley
       | and Miikkulainen. [2 [NEAT (2002)] Like the work of Angeline et
       | al., the method uses a type of parametric mutation that comes
       | from evolution strategies and evolutionary programming (now using
       | the most advanced form of the evolution strategies CMA-ES in
       | EANT2), in which adaptive step sizes are used for optimizing the
       | weights of the neural networks. Similar to the work of Stanley
       | (NEAT), the method starts with minimal structures which gain
       | complexity along the evolution_ path.
       | 
       | (in Hilbert space)
       | 
       | > [EANT] _introduces a genetic encoding called_ common genetic
       | encoding (CGE) _that handles both direct and indirect encoding of
       | neural networks within the same theoretical framework. The
       | encoding has important properties that makes it suitable for
       | evolving neural networks:_
       | 
       | > _It is complete in that it is able to represent all types of
       | valid phenotype networks._
       | 
       | > _It is closed, i.e. every valid genotype represents a valid
       | phenotype. (Similarly, the encoding is closed under genetic
       | operators such as structural mutation and crossover.)_
       | 
       | > _These properties have been formally proven. [3]_
       | 
       | Neither MOSES: Meta-Optimizing Semantic Evolutionary Search nor
       | PLN: Probabilistic Logic Networks are formally proven FWIU.
       | 
       | /? Z3 ... https://news.ycombinator.com/item?id=41944043 :
       | 
       | > _Does [formal verification [with Z3]] distinguish between x_
       | y+z and z+y _x, in terms of floating point output?_
       | 
       | > _math.fma() would be a different function call with the same or
       | similar output._
       | 
       | > _(deal-solver is a tool for verifying formal implementations in
       | Python with Z3.)_
        
       | kyma wrote:
       | HyperNEAT is also really cool:
       | https://en.wikipedia.org/wiki/HyperNEAT
       | 
       | It uses NEAT to evolve a CPPN which is then sampled at some
       | specified resolution to determine the actual network topology.
       | The really cool thing is the sampling resolution can be varied to
       | scale the size of the network while maintaining the overall
       | "shape" of the topology. I took Ken's neuroevolution course back
       | in the day and worked with HyperNEAT before deep learning got
       | big. Ever since, deep learning network architectures have always
       | reminded me of the dense layered topologies that result from
       | higher sampling resolutions with HyperNEAT. It would be
       | interesting to see if HyperNEAT along with Ken's novelty search
       | technique could be used to evolve useful deep learning
       | architectures.
        
       | tpoacher wrote:
       | If you like NEATs you might also want to have a look at PDGP by
       | R. Poli.
       | 
       | This is a more traditional / general GP in the sense that it
       | expresses computer programs rather than NNs specifically, but is
       | graph based rather than tree based, in a way that allows one to
       | add a "link set" on top of the "function set" and "terminal
       | sets", allowing it to fully express neural networks as well as
       | more general graph structures. And as the name implies, it lends
       | itself well to parallel/distributed operations.
       | 
       | The big difference with NEATs is that your initial population is
       | randomly generated programs rather than "base" networks ... but
       | you could easily treat these programs in the same manner Koza
       | used GPs to evolve circuits from "embryonic circuits" by having
       | terminals act as circuit-modifying operations.
        
       | Macuyiko wrote:
       | Some more interesting approaches in the same space:
       | 
       | - https://github.com/openai/evolution-strategies-starter
       | 
       | - https://cloud.google.com/blog/topics/developers-practitioner...
       | 
       | And perhaps most close:
       | 
       | - https://weightagnostic.github.io/
       | 
       | Which also showed that you can make NNs weight agnostic and just
       | let the architecture evolve using a GA.
       | 
       | Even though these approaches are cool and NEAT even is somewhat
       | easier to implement than getting started with RL (at least that
       | is what based on so many AI Youtubers starting with NEAT first)
       | they didn't ever seem to fully take off. Although knowing about
       | metaheuristics is still a good tool to know IMO.
        
       | JCoder58 wrote:
       | SharpNeat is an independent implementation that targets C# and
       | .NET 8.
       | 
       | https://github.com/colgreen/sharpneat
       | 
       | https://sharpneat.sourceforge.io/
        
         | neonsunset wrote:
         | Wow, started 7 years ago and is still updated. Props to the
         | authors!
        
       | orthecreedence wrote:
       | I did my high school senior project on NEAT! It was a c++ project
       | set up as an "ants eating food" expermient where their miniscule
       | brains learned over several iterations to go to the closest food
       | and eat it.
       | 
       | I later did a version (in common lisp) where instead of an ant
       | knowing where the nearest food was, it had a visual field
       | extending forward that would feed as inputs to the NN. Then added
       | predators, poisonous food items, etc etc. It was fascinating
       | seeing it all unfold.
       | 
       | Genetic algorithms seem like a somewhat forgotten but very handy
       | tool for ML. I've always wondered what the cost of running
       | experiments on species/populations over and over for however many
       | hundreds of iterations is compared to something like back
       | propagation on a large network.
        
       ___________________________________________________________________
       (page generated 2024-12-04 23:01 UTC)