[HN Gopher] Neuroevolution of augmenting topologies (NEAT algori...
___________________________________________________________________
Neuroevolution of augmenting topologies (NEAT algorithm)
Author : elsewhen
Score : 73 points
Date : 2024-12-02 10:12 UTC (2 days ago)
(HTM) web link (en.wikipedia.org)
(TXT) w3m dump (en.wikipedia.org)
| Rhapso wrote:
| I post a link to NEAT here about once a week.
|
| The big problem is that NEAT can't leverage a GPU effectively at
| scale (arbitrary topologies vs bipartite graphs)
|
| Other than that, it feels like a "road not taken" of machine
| learning. It handles building complex agent behaviors really
| well, and pressure to minimal topology results in understandable
| and reverse interpretable networks.
|
| Its easier to learn and implement than back propagation, the
| speciation is the most complex but awesome feature. It develops
| multiple solutions, clustering them to iterate on them
| separately. An online learning NEAT agent is in practice is an
| online collection of behaviors, adapting and swapping dominance
| as their fitness changes.
| Rhapso wrote:
| I think the major iteration that will come next is "NEAT in a
| dream" where a robot or agent online-trains a library of
| behaviors on an model of the environment constantly updated
| with the new experiences generated by the dominate behaviors
| interacting with reality.
| bob1029 wrote:
| > The big problem is that NEAT can't leverage a GPU effectively
| at scale
|
| I worry how much experimentation we are leaving on the table
| simply because these paths wouldn't implicate nvidia
| stockholders in an advantageous manner.
|
| Modern CPUs might as well be GPUs when you are running GA
| experiments from the 80s and 90s on them. We can do millions of
| generations per hour and populations that span many racks of
| machines with the technology on hand right now.
| api wrote:
| The more immediate causal reason is that there is no off the
| shelf high performance hardware to accelerate these paths.
| GPUs were made for graphics, not machine learning. They just
| happened to be really good at running certain kinds of models
| that can be reduced to a ton of matrix math that GPUs do
| really really fast.
|
| Something like Intel Xeon Phi or one of the many-many-core
| ARM designs I've heard talked about for years would be better
| for more open ended research in this field. You want loads
| and loads of simple general purpose cores with local RAM. Put
| like 4096 32-bit ARM or RISC-V cores with super fast local
| RAM on a die and transactional or DMA-like access to larger
| main memory, and make it so these chips can be cross linked
| to form even larger clusters the way nVidia cards can. This
| is the kind of hardware you'd want.
|
| When I was in college in the early 2000s I played around with
| genetic algorithms and artificial life simulations on a
| cluster of IBM Cell Broadband Engine processors at the
| University of Cincinnati. That CPU was an early hybrid many-
| core design where you had one PowerPC-based controller and a
| bunch of simplified specialized cores like a GPU but more
| designed for general purpose compute. Programming it was
| hairy but you could get great performance for the time on a
| lot of things.
|
| Computing is unfortunately full of probably better roads not
| taken for circumstantial reasons. JavaScript was originally
| supposed to be a functional language-- a real, cleaner one.
| There were much, much better OSes around than Unix in the 80s
| and 90s but they were proprietary or hardware-specific. We
| are stuck with IPv4 for a long time because IPv6 came too
| late, and if they'd just added say 1-2 more octets to V4 then
| V6 would not be necessary. I could go on for a while.
|
| This has led to a "worse is better" view that I think might
| just be a rationalization for the tyranny of path-dependent
| effects and lock-in after deployment.
| RaftPeople wrote:
| > _When I was in college in the early 2000s I played around
| with genetic algorithms and artificial life simulations on
| a cluster of IBM Cell Broadband Engine processors_
|
| Early/Mid/Late 2000's I was working on a hobby project for
| artificial life with ga evolved brains. I spent a lot of
| time investigating options, like the cell processor (I
| bought a playstation to test with). I also looked at
| interesting multi-core cpu's that were being introduced.
|
| I ended up with a combo of CPU+GPU on networked PC's, which
| was better than nothing but not ideal.
| bob1029 wrote:
| > I ended up with a combo of CPU+GPU on networked PC's,
| which was better than nothing but not ideal.
|
| The biggest constraint I've seen with scaling up these
| simulations is maintaining coherence of population
| dynamics across all of the processing units. The less
| often population members are exchanged, the more likely
| you will wind up with a decoupled population and stuck in
| a ditch somewhere. Since modern CPUs can go so damn fast,
| you need to exchange members quite frequently. Memory
| bandwidth and latency are the real troublemakers.
|
| You can spread a simulation across many networked nodes,
| but then your effective cycle time is potentially
| millions of times greater than if you keep it all in one
| socket. I think the newest multi-die CPUs hit the perfect
| sweet spot for these techniques (~100ns latency domain).
| tomrod wrote:
| I feel a bit embarrassed as, despite almost 15 years working in
| the field, I've not played with NEAT yet nor read up well on
| it. TIME TO CHANGE THAT :)
| epr wrote:
| > The big problem is that NEAT can't leverage a GPU effectively
| at scale (arbitrary topologies vs bipartite graphs)
|
| Is that true? These graphs can be transformed into a regular
| tensor shape with zero weights on unused connections. If you
| were worried about too much time/space used by these zero
| weights, you could introduce parsimony pressure related to the
| dimensions of transformed tensors rather than on the bipartite
| graph.
| Y_Y wrote:
| Or even use CuSparse, if you don't mind a little bit of extra
| work over normal cudnn.
|
| https://developer.nvidia.com/cusparse
| PartiallyTyped wrote:
| We can also show that sparse NNs under some conditions are
| ensembles of discrete subnets, and the authors of the
| original dropout paper argue that [dropout] effectively
| creates something akin to a forest of subnets all in
| "superposition".
| HelloNurse wrote:
| Another plausible strategy to neutralize arbitrary
| topologies: compile individual solutions or groups of similar
| solutions into big compute shaders that execute the network
| and evaluate expensive fitness functions, with parallel
| execution over multiple test cases (aggregated in
| postprocessing) and/or over different numerical parameters
| for the same topology.
| Rhapso wrote:
| Just because you can pack the topology into a sparse matrix
| doesn't make it actually go faster.
|
| Sparse matrices often don't see good speedup from GPUs.
|
| In addition, each network is unique, each neuron can have an
| entirely different activation function, and the topology is
| constantly changing. You will burn a lot on constantly re-
| packing into matrices that then don't see the same speedups a
| more wasteful topology pretends to have.
|
| On the flip-side out narrative of "speedup" is on bipartite
| graphs crunch faster in gpus and it might not be the same if
| the basis is actually utility of behaviors generated by the
| networks. A cousin thread explores this better.
| nurettin wrote:
| I never understood the appeal of NEAT. It's easy to conceive of
| mutation operators on fully connected layers instead of graphs
| of neurons and evaluate a lot faster. NEAT also seems to have
| at least a dozen hyperparameters.
| Rhapso wrote:
| It mostly manages its hyperparameters itself.
|
| There is a reward function, one of the hyperparameters we do
| have to set, for condensing functionality into topology
| instead of smearing it obscurely across laters. You can just
| look at a NEAT network analytically and know what is going on
| there.
| sakex wrote:
| Shameless plug:
|
| I have an implementation in Rust (CPU only) with added GRU gates
|
| https://crates.io/crates/neat-gru
| retrofrost wrote:
| Damn thats awesome, might be the thing that really gets me to
| mess around with Rust.
| retrofrost wrote:
| There's a lot of really interesting work in neuroevolution that
| has the potential to make some really interesting unsupervised
| training regimes. I think theres some really interesting
| possibilities for unique encoding schemes like ACE encoding to
| speed up training and provide much smarter behavior out the other
| end. Especially, if "genes" can form reusable elements of neural
| topology that makes scaling networks faster. Reusing components
| all over body is how we fit such complexity in the relatively
| little unique DNA we have. The other interesting thing about
| using genetic algorithms for a portion of training/network
| mapping is that allows you to have heterogenous networks, so you
| can have simulations or representations of astrocyte/glial
| behaivor easily get integrated with neural networks. With
| traditional training methods it's a massive fucking pain to train
| a non-feed forward network.
|
| I do think that languages like Elixir and other cpu concurrent
| strong tools can really be leveraged to make some dynamite
| libraries.
| sgolem wrote:
| I've written a NEAT implementation in Rust a few years back. It
| was able to train XOR and pole balancing.
|
| https://sgolem.com/blog/neat-simple-neuroevolution-framework...
| publicdaniel wrote:
| For anyone interested in NEAT, you'll likely enjoy Ken's later
| work on novelty search, open-endedness, etc.
|
| His book "Why Greatness Cannot be Planned: The Myth of the
| Objectives" is one of the most perspective altering works I have
| ever read.
|
| Very excited to see what he does next. He mentioned on Twitter a
| couple times his interest in representation learning and how
| objective based search affects this. Very interesting stuff
| supernikio2 wrote:
| Legendary Sethbling video from 2015 where he implemented NEAT for
| SMW: https://www.youtube.com/watch?v=qv6UVOQ0F44&t=1
| ThouYS wrote:
| That video inspired me to read the NEAT paper. I think that was
| the first scientific paper I ever printed out and read
| notpublic wrote:
| Thanks for that link!
|
| There is another video posted 7 months ago from Seth where he
| interviews Ken Stanley, the creator of the NEAT algorithm.
|
| https://www.youtube.com/watch?v=5zg_5hg8Ydo
| westurner wrote:
| NEAT:
| https://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_t...
|
| NEAT > See also links to EANT.
|
| EANT:
| https://en.wikipedia.org/wiki/Evolutionary_acquisition_of_ne... :
|
| > _Evolutionary acquisition of neural topologies (EANT /EANT2) is
| an evolutionary reinforcement learning method that evolves both
| the topology and weights of artificial neural networks. It is
| closely related to the works of Angeline et al. [1] and Stanley
| and Miikkulainen. [2 [NEAT (2002)] Like the work of Angeline et
| al., the method uses a type of parametric mutation that comes
| from evolution strategies and evolutionary programming (now using
| the most advanced form of the evolution strategies CMA-ES in
| EANT2), in which adaptive step sizes are used for optimizing the
| weights of the neural networks. Similar to the work of Stanley
| (NEAT), the method starts with minimal structures which gain
| complexity along the evolution_ path.
|
| (in Hilbert space)
|
| > [EANT] _introduces a genetic encoding called_ common genetic
| encoding (CGE) _that handles both direct and indirect encoding of
| neural networks within the same theoretical framework. The
| encoding has important properties that makes it suitable for
| evolving neural networks:_
|
| > _It is complete in that it is able to represent all types of
| valid phenotype networks._
|
| > _It is closed, i.e. every valid genotype represents a valid
| phenotype. (Similarly, the encoding is closed under genetic
| operators such as structural mutation and crossover.)_
|
| > _These properties have been formally proven. [3]_
|
| Neither MOSES: Meta-Optimizing Semantic Evolutionary Search nor
| PLN: Probabilistic Logic Networks are formally proven FWIU.
|
| /? Z3 ... https://news.ycombinator.com/item?id=41944043 :
|
| > _Does [formal verification [with Z3]] distinguish between x_
| y+z and z+y _x, in terms of floating point output?_
|
| > _math.fma() would be a different function call with the same or
| similar output._
|
| > _(deal-solver is a tool for verifying formal implementations in
| Python with Z3.)_
| kyma wrote:
| HyperNEAT is also really cool:
| https://en.wikipedia.org/wiki/HyperNEAT
|
| It uses NEAT to evolve a CPPN which is then sampled at some
| specified resolution to determine the actual network topology.
| The really cool thing is the sampling resolution can be varied to
| scale the size of the network while maintaining the overall
| "shape" of the topology. I took Ken's neuroevolution course back
| in the day and worked with HyperNEAT before deep learning got
| big. Ever since, deep learning network architectures have always
| reminded me of the dense layered topologies that result from
| higher sampling resolutions with HyperNEAT. It would be
| interesting to see if HyperNEAT along with Ken's novelty search
| technique could be used to evolve useful deep learning
| architectures.
| tpoacher wrote:
| If you like NEATs you might also want to have a look at PDGP by
| R. Poli.
|
| This is a more traditional / general GP in the sense that it
| expresses computer programs rather than NNs specifically, but is
| graph based rather than tree based, in a way that allows one to
| add a "link set" on top of the "function set" and "terminal
| sets", allowing it to fully express neural networks as well as
| more general graph structures. And as the name implies, it lends
| itself well to parallel/distributed operations.
|
| The big difference with NEATs is that your initial population is
| randomly generated programs rather than "base" networks ... but
| you could easily treat these programs in the same manner Koza
| used GPs to evolve circuits from "embryonic circuits" by having
| terminals act as circuit-modifying operations.
| Macuyiko wrote:
| Some more interesting approaches in the same space:
|
| - https://github.com/openai/evolution-strategies-starter
|
| - https://cloud.google.com/blog/topics/developers-practitioner...
|
| And perhaps most close:
|
| - https://weightagnostic.github.io/
|
| Which also showed that you can make NNs weight agnostic and just
| let the architecture evolve using a GA.
|
| Even though these approaches are cool and NEAT even is somewhat
| easier to implement than getting started with RL (at least that
| is what based on so many AI Youtubers starting with NEAT first)
| they didn't ever seem to fully take off. Although knowing about
| metaheuristics is still a good tool to know IMO.
| JCoder58 wrote:
| SharpNeat is an independent implementation that targets C# and
| .NET 8.
|
| https://github.com/colgreen/sharpneat
|
| https://sharpneat.sourceforge.io/
| neonsunset wrote:
| Wow, started 7 years ago and is still updated. Props to the
| authors!
| orthecreedence wrote:
| I did my high school senior project on NEAT! It was a c++ project
| set up as an "ants eating food" expermient where their miniscule
| brains learned over several iterations to go to the closest food
| and eat it.
|
| I later did a version (in common lisp) where instead of an ant
| knowing where the nearest food was, it had a visual field
| extending forward that would feed as inputs to the NN. Then added
| predators, poisonous food items, etc etc. It was fascinating
| seeing it all unfold.
|
| Genetic algorithms seem like a somewhat forgotten but very handy
| tool for ML. I've always wondered what the cost of running
| experiments on species/populations over and over for however many
| hundreds of iterations is compared to something like back
| propagation on a large network.
___________________________________________________________________
(page generated 2024-12-04 23:01 UTC)