[HN Gopher] Darwin Machines
___________________________________________________________________
Darwin Machines
Author : calepayson
Score : 154 points
Date : 2024-07-16 22:54 UTC (1 days ago)
(HTM) web link (vedgie.net)
(TXT) w3m dump (vedgie.net)
| calepayson wrote:
| I'm obsessed with the idea of Darwin Machines (and I think you
| should be too).
|
| I've been tinkering with the idea in python but I just don't have
| enough ML experience.
|
| If you, or anyone you know, is interested in Darwin Machines
| please reach out!
| turtleyacht wrote:
| Thank-you. Was wondering about your thoughts on emotions. Are
| they merely byproducts of the chemical evolution of the brain,
| or are they emergent artifacts of intelligence?
|
| A system cannot feel, but we can map neurochemistry as
| mechanistically as any other process. It would be interesting
| to discover whether a "pure brain" exists, or whether even its
| inputs, when considered in whole, colored its nature.
| calepayson wrote:
| Honestly, no idea.
|
| I could imagine a Darwin Machine being a "pure brain", as you
| put it. While we have emotions because our brains built a
| pure brain atop an existing and messy infrastructure.
|
| Or, emotions could just be the subjective experience of
| thoughts competing.
|
| Calvin goes deeper into things like this in the book, but I
| suspect emotions help intelligence to some extent insofar as
| they provide environmental change. It's good for the long
| term health of an ecosystem to shake things up so that
| nothing gets too stagnant.
| IAmGraydon wrote:
| Aren't emotions just the brain's way of experiencing the
| pain/pleasure feedback system for non-physical stimuli?
| DrMiaow wrote:
| It is the focus of my long-term crazy side project.
| https://youtu.be/sqvHjXfbI8o?si=7qwpc15Gn42mUnKQ&t=513
| aldousd666 wrote:
| This reminds me a little of Jeff Hawkins book, 1000 brain theory.
| His company numenta has done this kind of research and they have
| a mailing list. I'm not an expert but I've read Jeff's book and
| noodled at the mailing list
| calepayson wrote:
| I ordered the book and I'm checking out the website rn. Looks
| awesome. Thanks a ton for sharing!
| dstrbtd_evolute wrote:
| Hawkins' proposal is missing the key innovation that Calvin
| proposes, which is that learning take place by evolutionary
| means. But Hawkins' proposal does fit squarely within current
| popular ideas around predictive coding.
|
| The key structures in Hawkins' architecture are cortical
| columns (CC). What his VP of Eng (Dileep George) did is to
| analyze Hawkins' account of the functionality of a CC, and then
| say that a CC is a module which must conform to a certain API,
| and meet a certain contract. As long as a module obeys the API
| and contract, we don't actually care how the CC module is
| implemented. In particular, it's actually not important that
| the module contain neurons. (Though the connections between the
| CCs may still have to look like axons or whatever, I don't
| know.)
|
| Then Dileep George further figured out that there is an off the
| shelf algorithm than works perfectly for the CC module. He
| selected an algorithm which is based on statistical learning
| theory (STL).
|
| STL based algorithms are an excellent choice for the CCs,
| IMNSHO. They are fast, theoretically sound, etc. They are also
| understood in great mathematical detail, so we can characterize
| _exactly_ what they can and can't do. So there is zero mystery
| about the capabilities of his system at the CC level. Note that
| in Hawkins' case, the STL based algorithms are used for pattern
| recognition.
|
| Now Hawkin's proposal isn't just a single CC, it's a network of
| CC's all connected together in a certain way. My memory is a
| little hazy at this point, but as best I recall, his
| architecture should have no problem identifying sequences of
| patterns (as for sound), or spatial patterns across time (as
| for vision). And I bought his argument that these could be
| combined hierarchically, and that the same structure could also
| be used for playing back (outputting) a learned pattern, and
| for recognizing cross modal patterns (that is, across sensory
| modalities).
|
| But is all this enough?
|
| I say no. My reading of evolutionary epistemology suggests to
| me that pattern identification is insufficient for making and
| refuting conjectures in the most general sense. And ultimately,
| a system must be able to create and refute conjectures to
| create knowledge. Hawkins has a very weak story about
| creativity and claims that it can all be done with pattern
| recognition and analogy, but I am not convinced. It was the
| weakest part of the book. (pp 183-193)
|
| I don't know if it's clear to you or not why pattern
| recognition is insufficient for doing general conjectures and
| refutations. If it's not clear, I should attempt to expand on
| that ...
|
| The idea is, that it is not always possible to arrive at a
| theory by just abstracting a pattern from a data set. For
| example:
|
| What set of data could Newton have looked at to conclude that
| an object in motion stays in motion? I suppose he knew of 7
| planets that stayed in motion, but then he had millions of
| counter examples all around him. For a pattern recognition
| algorithm, if you feed it a million + 7 data points, it will
| conclude that objects in motion always come to a stop except
| for a few weird exceptions which are probably noise.
| calepayson wrote:
| This is an awesome write up. I especially love the Newton
| analogy. Thanks.
| exe34 wrote:
| my impression of hawkins from a distance is that he can
| reproduce the success of the current orthodoxy, but is always a
| few years behind sota.
| mprime1 wrote:
| FYI Evolutionary Algorithms have been an active area of research
| for decades.[1]
|
| Among the many uses, they have been applied to 'evolving' neural
| networks.
|
| Famously a guy whose name I can't remember used to generate
| programs and mutations of programs.
|
| My recommendation if you want to get into AI: avoid anything
| written in the last 10 years and explore some classics from the
| 70s
|
| [1] https://en.m.wikipedia.org/wiki/Evolutionary_algorithm
| fancy_pantser wrote:
| Perhaps it was John Koza?
|
| http://www.genetic-programming.com/johnkoza.html
| EvanAnderson wrote:
| I'm sure it's not who you're thinking of, but I can't miss an
| opportunity to mention Tom Ray and Tierra:
| https://tomray.me/tierra/whatis.html
| blixt wrote:
| A friend of mine made this in-browser neural network engine
| that could run millions of multi-layer NNs in a simulated world
| at hundreds of updates per second and each network could
| reproduce and evolve. It worked in the sense that the networks
| exhibited useful and varied behaviors. However, it was clear
| that larger networks were needed for more complex behaviors and
| evolution just starts to take a lot longer.
|
| https://youtu.be/-1s3Re49jfE?si=_G8pEVFoSb2J4vgS
| petargyurov wrote:
| > avoid anything written in the last 10 years
|
| Why?
| exe34 wrote:
| presumably because it's saturated with a monoculture, and the
| hope (rightly or wrongly), some of the other roads might lead
| to some alternative breakthrough.
| PinkMilkshake wrote:
| In the _Creatures_ artificial life / virtual pet series, the
| creatures have about 900 (maybe more in later versions) or so
| neurons. Each neuron is a little virtual machine that is
| designed in such a way that programs remain valid even with
| random mutation.
| mandibeet wrote:
| Your recommendation to explore the classics is a good one. You
| can gain a deeper appreciation by studying these foundational
| works
| JoeDaDude wrote:
| There is the case of Blondie24, an evolutionary neural net, or
| genetic algorithm, which was able to develop a very strong
| checkers-playing capability by self-play with no human
| instruction. It was later extended to paly other games.
|
| https://en.wikipedia.org/wiki/Blondie24
| northernman wrote:
| I read this book: <https://books.google.ca/b
| ooks/about/Artificial_Intelligence_Through_Simulate.html?id=QML
| aAAAAMAAJ>
|
| in 1972. It was published in 1966.
| jaimie wrote:
| The domain of Artificial Life is highly related and has had an
| ongoing conference series and journal going, might be worth
| mining for more inspiration:
|
| https://en.wikipedia.org/wiki/Artificial_life
| https://direct.mit.edu/artl https://alife.org
| sdwr wrote:
| Fantastic speculation here, explains a lot, and has testable
| hypotheses.
|
| For example, there should be a relationship between rate of
| learning and the physical subcolumns - we should be able to
| identify when a single column starts up / is fully trained / is
| overused
|
| Or use AI to try to mirror the learning process, creating an
| external replica that makes the same decisions as the person
|
| Marvin Minsky was spot on about the general idea 50 years ago,
| seeing the brain as a collection of 1000s of atomic operators
| (society of mind?)
| calepayson wrote:
| > Fantastic speculation here, explains a lot, and has testable
| hypotheses.
|
| Calvin is the man.
|
| > For example, there should be a relationship between rate of
| learning and the physical subcolumns - we should be able to
| identify when a single column starts up / is fully trained / is
| overused
|
| This sounds super interesting. Could you break down what you're
| thinking here?
|
| > Marvin Minsky was spot on about the general idea 50 years
| ago, seeing the brain as a collection of 1000s of atomic
| operators (society of mind?)
|
| I'm very much an amateur in this field and was under the
| impression that Minsky was trying to break it up, but was
| trying to specify each of those operations. What I find so
| enticing about Neural Darwinism is the lack of specification
| needed. Ideally, once you get the underlying process right,
| there's a cascade of emergent properties.
|
| Using the example of a murmuration of starlings I picture
| Minsky trying to describe phase transitions between every
| possible murmuration state. On the other hand I see Neural
| Darwinism as an attempt to describe the behavior of a single
| starling which can then be scaled to thousands.
|
| Let me know if that's super wrong. I've only read second hand
| descriptions of Minsky's ideas, so feel free to send some
| homework my way.
| breck wrote:
| > I've only read second hand descriptions of Minsky's ideas,
| so feel free to send some homework my way.
|
| Here you go: https://breckyunits.com/marvin-minsky.html
|
| I think you are right in that Minsky was missing some
| important details in the branches of the tree, particularly
| around cortical columns, but he was old when Hawkins and
| Numenta released their stuff.
|
| In terms of the root idea of the mind being a huge number of
| concurrent agents, I think he was close to the bullseye and
| it very much aligns with what you wrote.
| calepayson wrote:
| Awesome post, thanks. I ordered society of mind.
|
| Reminds me of when I took "The Philosophy of Cognitive
| Science" in college. The entire class was on AI. When I
| asked the professor why, she explained: "You don't
| understand something unless you can build it".
|
| It's cool to learn that quote might have been because she's
| a fan of Minsky.
|
| > In terms of the root idea of the mind being a huge number
| of concurrent agents, I think he was close to the bullseye
| and it very much aligns with what you wrote.
|
| I think you're right here and I'd like to add a bit. One
| common mistake people make when thinking of evolution, is
| where in the hierarchy it takes place. In other words, they
| misidentify the agents by an order of magnitude.
|
| For example, in biology I commonly see it taught that the
| individual is the subject of natural selection (or worse,
| the population).
|
| Really, it's the gene. The beauty of evolution is that it
| can take an agent as simple as the gene and shape it into
| the litany of complex forms and functions we see all around
| us.
|
| If evolution is at play in the brain, I suspect that
| Minsky's agents are the individual firing patterns. Like
| genes, the base of the hierarchy, the fundamental unit.
| Also like genes, they slowly build increasingly complex
| behaviors from the ground up. Starting before birth and
| continuing for most of our lives.
| breck wrote:
| Right, the Selfish Gene is one of the best books I ever
| read.
|
| There's also a paper I recently came across
| (https://warpcast.com/breck/0xea2e1a35) which talks about
| how causation is a two way street: low level nodes cause
| things in higher level nodes, but higher level nodes in
| turn cause things in lower level nodes.
|
| In other words, just because genes have really been the
| drivers and our bodies just the vehicles, doesn't mean
| that's not cyclical (sometimes it could cycle to be the
| higher level ideas driving the evolution in lower level
| agents).
|
| > I suspect that Minsky's agents are the individual
| firing patterns.
|
| I like this idea. The biggest open question in my mind in
| regards to Minsky still is exactly on this: what
| physically is an agent? How many are there? My margin of
| error here is wild -- like 10 orders of magnitude.
| jcynix wrote:
| Regarding Minsky: the most interesting thoughts I read about
| theories of a mind, are his books, namely: _The Society of
| Mind_ and _The Emotion Machine_ which should be more widely
| known.
|
| More of Minsky's ideas on "Matter, Mind, and Models" are
| mentioned here:
| https://www.newyorker.com/magazine/1981/12/14/a-i
|
| And let's not forget Daniel Dennett: In "Consciousness
| Explained," a 1991 best-seller, he described consciousness as
| something like the product of multiple, layered computer
| programs running on the hardware of the brain. [...]
|
| Quoted from
| https://www.newyorker.com/magazine/2017/03/27/daniel-dennett...
| breck wrote:
| I took more notes on this blog post than anything else I've read
| this month.
| calepayson wrote:
| Man, this has me grinning like an idiot. Thanks.
| jekude wrote:
| I've been noodling on how to combine neural networks with
| evolution for a while. I've always thought that to do this, you
| need some sort of evolvable genetic/functional units, and have
| had trouble fitting traditional artificial neurons w backprop
| into that picture.
|
| My current rabbit hole is using Combinatory Logic as the genetic
| material, and have been trying to evolve combinators, etc (there
| is some active research in this area).
|
| Only slightly related to the author's idea, its cool that others
| are interested in this space as well.
| pyinstallwoes wrote:
| Thermodynamic annealing over a density parameter space
| Matumio wrote:
| Then probably you know about NEAT (the genetic algorithm) by
| now. I'm not sure what has been tried in directly using
| combinatorical logic instead of NNs (do Hopfield networks
| count?), any references?
|
| I've tried to learn simple look-up tables (like, 9 bits of
| input) using the Cross-Entropy method (CEM), this worked well.
| But it was a very small search space (way too large to just try
| all solutions, but still, a tiny model). I haven't seen the CEM
| used on larger problems. Though there is a cool paper about
| learning tetris using the cross-entropy method, using a bit of
| feature engineering.
| daveguy wrote:
| I am familiar with NEAT, it was very exciting when it came
| out. But, NEAT does not use back propagation or single
| network training at all. The genetic algorithm combines
| static neural networks in an ingenious way.
|
| Several years prior, in undergrad, I talked to a professor
| about evolving network architectures with GA. He scoffed that
| squishing two "mediocre" techniques together wouldn't make a
| better algorithm. I still think he was wrong. Should have
| sent him that paper.
|
| IIRC NEAT wasn't SOTA when it came out, but it is still a
| fascinating and effective way to evolve NN architecture using
| genetic algorithms.
|
| If OP (or anyone in ML) hasn't studied it, they should.
|
| https://en.m.wikipedia.org/wiki/Neuroevolution_of_augmenting.
| .. (and check the bibliography for the papers)
|
| Edit: looking at the continuation of NEAT it looks like they
| focused on control systems, which makes sense. The evolved
| network structures are relatively simple.
| peheje wrote:
| Maybe a key innovation would be to apply backpropagation to
| optimize the crossover process itself. Instead of random
| crossover, compute the gradient of the crossover operation.
|
| For each potential combination, "learn" (via normal backprop)
| how different ways of crossover impacts on overall network
| performance. Then use this to guide the selection of optimal
| crossover points and methods.
|
| This "gradient-optimized crossover" would be a search process
| in itself, aiming to find the best way to combine specific
| parts of networks to maximize improvement of the whole. It
| could make "leaps", instead of small incremental steps, due to
| the exploratory genetic algorithm.
|
| Has anything like this been tried?
| paraschopra wrote:
| The book "Cerebral Code" is made available for free by the author
| on his website: http://williamcalvin.com/bk9/
|
| For a more modern treatment on the subject, read this paper: An
| Attempt at a Unified Theory of the Neocortical Microcircuit in
| Sensory Cortex
| https://www.researchgate.net/publication/343269087_An_Attemp...
| ViscountPenguin wrote:
| This strongly reminds me of the algorithm used by swarming
| honeybees (if anyone's interested I'd highly recommend reading
| honeybee democracy). I reckon there's something to this.
|
| I might have a go implementing something along these lines.
| DrMiaow wrote:
| This project employs a Darwinian approach. Initially, it was an
| experiment in traditional program and user interface generation
| that incorporated evolutionary feedback into the mutation
| process. A combination of PG and AL. It has achieved some success
| with small programs and is now exploring the potential
| combination of LLMs
|
| https://youtu.be/sqvHjXfbI8o?si=7qwpc15Gn42mUnKQ&t=513
| visarga wrote:
| I don't think it matters so much how the brain is made, what
| matters is the training data. And we obtain data by searching.
| Search is a great concept, it covers evolution, intelligence and
| creativity, it's also social. Search is discrete, recursive,
| combinatorial and based on some kind of language (DNA, or words,
| or just math/code).
|
| Searching the environment provides the data brain is trained on.
| I don't believe we can understand the brain in isolation without
| its data engine and the problem space where it develops.
|
| Neural nets showed that given a dataset, you can obtain similar
| results with very different architectures, like transformer and
| diffusion models, or transformer vs Mamba. The essential
| ingredient is data, architecture only needs to pass some minimal
| bar for learning.
|
| Studying just the brain misses the essential - we are search
| processes, the whole life is search for optimal actions, and
| evolution itself is search for environment fitness. These search
| processes made us what we are.
| advael wrote:
| What in the world
|
| Most "diffusion models" use similar VAE to transformer backbone
| architectures. Diffusion isn't an architecture, it's a problem
| framing
|
| As for the rest of this, I'm torn between liking the poetry of
| it and pointing out that this is kind of that thing where you
| say something like it's supposed to be a mind-blowing insight
| when it's well-known and pretty obvious. Most people familiar
| with learning theory already understand learning algorithms of
| any kind as a subset of probabilistic search algorithms with
| properties that make them responsive to data. The idea that the
| structure of the information processing system doesn't matter
| and there's just this general factor of learning capacity a
| thing has is... not well supported by the way in which research
| has progressed in the entire period of time when this has been
| relevant to most people? Sure, in theory any neural network is
| a general function approximator and could in theory learn any
| function it's complex enough to represent. Also, we can arrive
| at the solution to any computable problem by representing it as
| a number and guessing random numbers until we can verify a
| solution. Learning algorithms can almost be defined as attempts
| to do better search via structured empiricism than can be done
| with the assumption that structure doesn't matter. Like,
| sometimes multiple things work, sure. That doesn't mean it's
| arbitrary
|
| TL;DR: Of course learning is a kind of search, but discovering
| structures that are good at learning is the whole game
| Xcelerate wrote:
| Yeah, I really don't understand this recently popular
| viewpoint that the algorithm doesn't matter, just how much
| data you throw at it. It doesn't seem to be based on anything
| more than wishful thinking.
|
| One can apply Hutter search to solve just about any problem
| conceivable given the data and guess what--you'll approach
| the optimal solution! The only downside is that this process
| will take more time than available in our physical universe.
|
| I think people forget the time factor and how the entire
| field of computational complexity theory arose because the
| meta problem is not that we can't solve the problem--it's
| that we can't solve it quickly enough on a timescale that
| matters to humans.
|
| Current NN architectures are missing something very
| fundamental related to the efficiency of problem solving, and
| I really don't see how throwing more data at them is going to
| magically convert an EXPTIME algorithm into a PTIME one. (I'm
| not saying NNs are EXPTIME; I'm saying that they are
| incapable of solving entire classes of problems that have
| both PTIME and EXPTIME solutions, as the NN architecture is
| not able to "discover" PTIME solutions, thus rendering them
| incapable of solving those classes of problems in any
| practical sense).
| advael wrote:
| Also, one of the major classes of problem that gets solved
| and we view as "progress" in machine learning is framing
| problems. Like we couldn't define "draw a good picture" in
| a way we could actually assess well, GANs and Diffusion
| turn out to be good ways to phrase problems like that. In
| the former case, it creates a way to _define_ the problem
| as "make something indistinguishable from an example
| pulled from this dataset" and in the latter case, "I've
| randomized some of these pixels, undo that based on the
| description"
|
| The idea of "efficiency" and "progress" is this post-hoc
| rationalization that people who never understood the
| problem, pay people to understand the problem, apply to
| problems once they have a working solution in hand. It's a
| model that is inherently as dumb as a model can be, and the
| assumption it makes is that there is some general factor of
| progress on hard problems that can be dialed up and down.
| Sure, you can pay more scientists and probabilistically
| increase the rate at which problems are solved, but you
| can't predict how long it will take, how much failure it
| will involve, whether a particular scientist will solve a
| particular problem at all, whether that problem is even
| solvable in principle sometimes. Businesspeople and
| investors like models where you put in money and you get
| out growth at a predictable rate with a controllable
| timescale, and if this doesn't work you just kick it
| harder, and this ill fits most frontier research. Hell, it
| ill suits a lot of regular work.
| visarga wrote:
| > Sure, you can pay more scientists and probabilistically
| increase the rate at which problems are solved, but you
| can't predict how long it will take, how much failure it
| will involve, whether a particular scientist will solve a
| particular problem at all, whether that problem is even
| solvable in principle sometimes.
|
| Fully agree, search is hard, unpredictable and expensive.
| Also a matter of luck, being at the right place and time,
| and observing something novel. That is why I put the
| emphasis of AI doing search, not just imitating humans.
| advael wrote:
| Okay, but what does that mean? AI is a search process. Do
| you mean you want the AI to formulate queries? Test
| hypotheses? Okay. How? What does that mean? What we know
| how to do is to mathematically define objective functions
| and tune against them. What objective function describes
| the behavior you want? Is there some structural paradigm
| we can use for this other than tuning the parameters on a
| neural network through optimization toward an objective
| function? If so, what is it?
|
| I'm sorry to be a little testy but what you've basically
| said is "We should go solve the hard problems in AI
| research". Dope. As an AI researcher I fully agree.
| Thanks. Am I supposed to clap or something?
| visarga wrote:
| Not "throwing more data at them" but having the AI discover
| things by searching. AI needs to contribute to the search
| process to graduate the parrot label.
| visarga wrote:
| > Of course learning is a kind of search, but discovering
| structures that are good at learning is the whole game
|
| No, you missed the essential. I mentioned search in the
| context of discovery, or in other words expanding knowledge.
|
| Training neural nets is also a search for the best parameters
| that fit the data, but it's secondary. Many architectures
| work, there have been a thousand variations for the
| transformer architectures and plenty of RNN-like approaches
| since 2017 when transformer was invented, and none of them is
| better than the current one or significantly worse.
|
| Also, considering human population, the number of neurons in
| the brain, synapses and wiring are very different at micro
| level from person to person, yet we all learn. The difference
| between the top 5% and bottom 5% humans is small compared
| with other species, for example. What makes a big difference
| between people is education, in other words experiences, or
| training data.
|
| To return to the original idea - AI that simply learns to
| imitate human text is capable only of remixing ideas. But an
| AI that actively explores can discover novel ideas, like
| AlphaZero and AlphaTensor. In both these cases search played
| a major role.
|
| So I was generalizing the concept of "search" across many
| levels of optimization, from protein folding to DNA and human
| intelligence. Search is essential for progress across the
| stack. Even network architecture evolves by search - with
| human researchers.
| calepayson wrote:
| >I don't think it matters so much how the brain is made, what
| matters is the training data.
|
| I agree that training data is hugely important but I think it
| does matter how the brain is made. Structures in the brain are
| remarkably well preserved between species. Despite the fact
| that evolution loves to try different methods, if it can get
| away with it.
|
| > Searching the environment provides the data brain is trained
| on. I don't believe we can understand the brain in isolation
| without its data engine and the problem space where it
| develops.
|
| I completely agree and suspect we might be on the same page.
| What I find most compelling about the idea of Darwin Machines
| is the fact that it relies on evolution. In my opinion, true
| Dawkinsian evolution, is the most efficient search algorithm.
|
| I'd love to hear you go deeper on what you mean by data engine
| and problem space. To (possibly) abuse those terms, I think
| evolution is the data engine. The problem space is fun and I
| love David Eagleman's description of the brain as sitting in a
| warm bath in a dark room trying to figure out what to do with
| all these electric shocks.
|
| > Neural nets showed that given a dataset, you can obtain
| similar results with very different architectures, like
| transformer and diffusion models, or transformer vs Mamba. The
| essential ingredient is data, architecture only needs to pass
| some minimal bar for learning.
|
| My understanding of neural nets, and please correct me if I'm
| wrong, is that they solve system-one thinking, intuition. As of
| yet, they haven't been able to do much more than produce an
| average of their training data (which is incredible). With a
| brute force approach they can innovate in constrained
| environments, e.g. move 37 (or so I'm told, I haven't played go
| :)). I haven't seen evidence that they might be able to
| innovate in open-ended environments. In other words, there's no
| suggestion they can do system-two thinking where time spent on
| a problem correlates with the quality of the answer.
|
| > Studying just the brain misses the essential - we are search
| processes, the whole life is search for optimal actions, and
| evolution itself is search for environment fitness.
|
| I completely agree. I even suspect that, in a few years, we'll
| see "life" and "intelligence" as synonymous concepts, just
| implemented in different mediums. At the same time, studying
| those mediums can be a blast.
| nikolayasdf123 wrote:
| > This layering forms the dominant narrative of how intelligence
| may work and is the basis for deep neural nets. The idea is,
| stimulus is piped into the "top" layer and filters down to the
| bottom layer, with each layer picking up on more and more
| abstract concepts.
|
| popular deep artificial neural networks (lstms, llms, etc.) are
| highly recurrent, in which they are simulating not deep networks,
| but shallow networks that process information in loops many
| times.
|
| > columns.. and that's about it.
|
| recommend not to oversimplify structure here. what you describing
| is only high-level structure of single part of brain (neocortex).
|
| 1. brain has many other structures inside basal ganglia,
| cerebellum, midbrain, etc. each with different characteristic
| micro-circuits.
|
| 2. brain networks are highly interconnected on long range.
| neurons project (as in send signals) to very distant parts of the
| brain. similarly they get projections from other distant parts of
| brain too.
|
| 3. temporal dimension is important. your article is very ML-like
| focusing on information processing devoid of temporal dimension.
| if you want to draw parallels to real neurons in brain, need to
| explain how it fits into temporal dynamics (oscillations in
| neurons and circuits).
|
| 4. is this competition in realm of abeyant (what you can think in
| principle) or current (what you think now) representations?
| what's the timescales and neurological basis for this?
|
| overall, my take it is a bit ML-like talk. if it describes real
| neurological networks it got to be closer and stronger
| neurological footing.
|
| here is some good material, if you want to dive into
| neuroscience. "Principles of Neurobiology", Liqun Luo, 2020 and
| "Fundamental Neuroscience", McGraw Hill.
|
| more resources can be found here:
|
| http://neuroscience-landscape.com/
| calepayson wrote:
| > popular deep artificial neural networks (lstms, llms, etc.)
| are highly recurrent, in which they are simulating not deep
| networks, but shallow networks that process information in
| loops many times.
|
| Thanks for the info. Is there anything you would recommend to
| dive deeper into this? Books/papers/courses/etc.
|
| > recommend not to oversimplify structure here. what you
| describing is only high-level structure of single part of brain
| (neocortex).
|
| Nice suggestion. I added a bit to make it clear that I'm
| talking about the neocortex.
|
| > 1 & 2
|
| Totally. I don't think AI is a simple as building a Darwin
| Machine, much like it's not as simple as building a neural net.
| But I think the concept of a Darwin Machine is an interesting,
| and possibly important, component.
|
| My goal with this post was to introduce folks who hadn't heard
| of this concept and, hopefully, get in contact with folks who
| had. I left out the other so I could try to focus on what
| matters.
|
| > temporal dimension is important. your article is very ML-like
| focusing on information processing devoid of temporal
| dimension. if you want to draw parallels to real neurons in
| brain, need to explain how it fits into temporal dynamics
| (oscillations in neurons and circuits).
|
| Correct me if I misunderstand, but I believe I did. The spatio-
| temporal firing patterns of minicolumns contain the temporal
| dimension. I touched on the song analogy but we can go deeper
| here.
|
| Let's imagine the firing pattern of a minicolumn as a melody
| that fits within the period of some internal clock (I doubt
| there's actually a clock but I think it's a useful analogy).
| Each minicolumn starts "singing" its melody over and over, in
| time with the clock. Each clock cycle, every minicolumn is
| influenced by its neighbors within the network and they begin
| to sync up. Eventually they're all harmonizing to the same
| melody.
|
| A network might propagate a bunch of different melodies at
| once. When they meet, the melodies "compete". Each tries to
| propagate to a new minicolumn and fitness is judged by other
| inputs to that minicolumn (think sensory) and the tendencies of
| that minicolumn (think memory).
|
| I think the evolution is an incredible algorithm is because it
| relies as much as it does on time.
|
| > is this competition in realm of abeyant (what you can think
| in principle) or current (what you think now) representations?
| what's the timescales and neurological basis for this?
|
| I'm not familiar with these ideas but let me give it a shot.
| Feel free to jump in with more questions to help clarify.
|
| Neural Darwinism points to structures - minicolumns, cortical
| columns, and interesting features of their connections - and
| describes one possibility for how those structures might lead
| to thought. In your words, I think the structures are the realm
| of abeyant representations while the theory describes current
| representations.
|
| The neurological basis for this, the description of the abeyant
| representation (hope I'm getting that right), is Calvin's
| observations of the structure of the brain. Observations based
| on his and other's research.
|
| To a large extent, neuroscience doesn't have a great through-
| line-story of how the brain works. For example the idea of
| regions of the brain responsible for specific functions - like
| the hippocampus for memory - doesn't exactly play nice with
| Karl Lashley's experimental work on memory.
|
| What I liked most about this book is how Calvin tried to relate
| his theory to both structure and experimental results.
|
| > overall, my take it is a bit ML-like talk. if it describes
| real neurological networks it got to be closer and stronger
| neurological footing.
|
| If, by ML-like talk, you mean a bit woo-woo and hand wavy. Ya,
| I agree. Ideally I'd be a better writer. But I'm not, so I
| highly recommend the book.
|
| It's written by an incredible neuroscientist and, so far, none
| of the neuroscience researchers I've given it to have expressed
| anything other than excitement about it. And I explicitly told
| them to keep an eye out for places they might disagree. One of
| them is currently reading it a second time right now with the
| goal verifying everything. If it all checks out, he plans on
| presenting the ideas to his lab. I'll update the post if he, or
| anyone in his lab, finds something that doesn't check out.
|
| > here is some good material, if you want to dive into
| neuroscience. "Principles of Neurobiology", Liqun Luo, 2020 and
| "Fundamental Neuroscience", McGraw Hill.
|
| Why these two textbooks? I got my B.S. in neuroscience so I
| feel good about the foundations. Happy to check these out if
| you believe they add something that many other textbooks are
| missing.
| lachlan_gray wrote:
| Oh dude this is so cool. I think you're dead right.
|
| If you'll pardon some woo, another argument I see in favour of
| message passing/consensus, is that it "fits" the self similar
| nature of life patterns.
|
| Valid behaviours that replicate and persist, for only the reason
| that they do.
|
| Culture, religion, politics, pop songs, memes... "Egregore" comes
| to mind. In some ways "recombination" could be seen as
| "cooperation", even at the level of minicolumns.
|
| (Edit: what I mean to say is that it kinda makes sense that the
| group dynamics between constituent units of one brain would be
| similar in some way to the group dynamics you get from a bunch of
| brains)
| pshc wrote:
| > _These connections result in a triangular array of connected
| minicolumns with large gaps of unconnected minicolumns in
| between. Well, not really unconnected, each of these are
| connected to their own triangular array._
|
| > _Looking down on the brain again, we can imagine projecting a
| pattern of equilateral triangles - like a fishing net - over the
| surface. Each vertex in the net will land on a minicolumn within
| the same network, leaving holes over minicolumns that don 't
| belong to that network. If we were to project nets over the
| network until every minicolumn was covered by a vertex we would
| project 50-100 nets._
|
| Around this part I had a difficult time visualizing the intent
| here. Are there any accompanying diagrams or texts? Thanks for
| the interesting read!
| calepayson wrote:
| http://williamcalvin.com/bk9/index.htm
|
| I'd recommend just banging out chapters 1-4 of the book (~60
| pages). Lot's of diagrams and I think you'll get the meat of
| the idea.
|
| Thanks for the feedback!
| slow_typist wrote:
| The title of the referenced book by Erwin Schrodinger is "what is
| life", I believe.
|
| https://archive.org/details/whatislife0000erwi
| calepayson wrote:
| Thanks for pointing this out. I'll change it when I'm at my
| computer.
| auraai wrote:
| There's lots of room for cross-pollination between bio/life
| sciences and ML/AI. One key insight is the importance of what you
| pick as your primary representation of data (is everything a
| number, a symbolic structure, a probability distribution, etc). I
| believe a lot of these bio-inspired approaches over-emphasize the
| embodied nature of intelligence and how much it needs to be
| situated in space and time, which downplays all the sub-problems
| that need to be solved in other "spaces" with less obvious
| "spatiotemporal" structure. I believe space and time are
| emergent, at least for the purposes of defining intelligence, and
| there are representations where both space and time arise as
| dimensions of their structure and evolution.
| wdwvt1 wrote:
| This post analogizes between a specific theory of human
| intelligence and a badly caricatured theory of evolution. It
| feels like better versions of the arguments for Darwin Machines
| exist that would not: a) require an unsupportable neuron-centric
| view of evolution, and b) don't view evolution through the
| programmers lens.
|
| > Essentially, biology uses evolution because it is the best way
| to solve the problem of prediction (survival/reproduction) in a
| complex world.
|
| 1. This is anthropocentric in a way that meaningfully distorts
| the conclusion. The vast majority of life on earth, whether you
| count by raw number, number of species, weight, etc. do not have
| neurons. These organisms are of course, microbes (viruses and
| prokaryotes) and plants. Bacteria and viruses do not 'predict' in
| the way this post speaks of. Survival strategies that bacteria
| use (that we know about and understand) are hedging-based. For
| example, some portion of a population will stochastically switch
| certain survival genes on (e.g. sporulation, certain efflux pumps
| = antibiotic resistance genes) that have a cost benefit ratio
| that changes depending on the condition. This could be construed
| as a prediction in some sense: the genome that has enough
| plasticity to allow certain changes like this will, on average,
| produce copies in a large enough population that enable survival
| through a tremendous range of conditions. But that's a very
| different type of prediction than what the rest of the post talks
| about. In short, prediction is something neurons are good at, but
| it's not clear it's a 'favored' outcome in our biosphere.
|
| > It relies on the same insight that produced biology: That
| evolution is the best algorithm for predicting valid "solutions"
| within a near infinite problem space.
|
| 2. This gets the teleology reversed. Biology doesn't use
| anything, it's not trying to solve anything, and evolution isn't
| an algorithm because it doesn't have an end goal or a teleology
| (and it's not predicting anything). Evolution is what you observe
| over time in a population of organisms that reproduce without
| perfect fidelity copy mechanisms. All we need say is that things
| that reproduce are more likely to be observed. We don't have to
| anthropomorphize the evolutionary process to get an explanation
| of the distribution of reproducing entities that we observe or
| the fact that they solve challenges to reproduction.
|
| > Many people believe that, in biology, point mutations lead to
| the change necessary to drive novelty in evolution. This is
| rarely the case. Point mutations are usually disastrous and every
| organism I know of does everything in its power to minimize them.
| Think, for every one beneficial point mutation, there are
| thousands that don't have any effect, and hundreds that cause
| something awful like cancer. If you're building a skyscraper,
| having one in a hundred bricks be laid with some variation is not
| a good thing. Instead Biology relies on recombination. Swap one
| beneficial trait for another and there's a much smaller chance
| you'll end up with something harmful and a much higher chance
| you'll end up with something useful. Recombination is the key to
| the creativity of evolution, and Darwin Machines harness it.
|
| 3. An anthropocentric reading of evidence that distorts the
| conclusion. The fidelity (number of errors per cycle per base
| pair) of the polymerases varies by maybe 8 orders of magnitude
| across the tree of life. For a review, see figure 3 in ref [1]. I
| don't know about Darwin Machines, but the view that
| 'recombination' is the key to evolution is a conclusion you would
| draw if you examined only a part of the tree of life. We can
| quibble about viruses being alive or not, but they are certainly
| the most abundant reproducing thing on earth by orders of
| magnitude. Recombination doesn't seem as important for adaptation
| in them as it does in organisms with chromosomes.
|
| 4. There are arguments that seem interesting (and maybe not
| incompatible with some version of the post) that life should be
| abundant because it actually helps dissipate energy gradients.
| See the Quanta article on this [0].
|
| [0] https://www.quantamagazine.org/a-new-thermodynamics-
| theory-o... [1] Sniegowski, P. D., Gerrish, P. J., Johnson, T., &
| Shaver, A. (2000). The evolution of mutation rates: separating
| causes from consequences. BioEssays, 22(12), 1057-1066.
| doi:10.1002/1521-1878(200012)22:12<1057::aid-bies3>3.0.co;2-w
| fedeb95 wrote:
| isn't this the same as genetic algorithms ?
| nirvael wrote:
| I think this is over-simplified and possibly misunderstood. I
| haven't read the book this article references but if I am
| understanding the main proposal correctly then it can be
| summarised as "cortical activity produces spatial patterns which
| somehow 'compete' and the 'winner' is chosen which is then
| reinforced through a 'reward'".
|
| 'Compete', 'winner', and 'reward' are all left undefined in the
| article. Even given that, the theory is not new information and
| seems incredibly analogous to Hebbian learning which is a long-
| standing theory in neuroscience. Additionally, the metaphor of
| evolution within the brain does not seem apt. Essentially what is
| said is that given a sensory input, we will see patterns emerge
| that correspond to a behaviour deemed successful. Other brain
| patterns may arise but are ignored or not reinforced by a reward.
| This is almost tautological, and the 'evolutionary process'
| (input -> brain activity -> behaviour -> reward) lacks
| explanatory power. This is exactly what we would expect to see.
| If we observe a behaviour that has been reinforced in some way,
| it would obviously correlate with the brain producing a specific
| activity pattern. I don't see any evidence that the brain will
| always produce several candidate activity patterns before judging
| a winner based on consensus. The tangent of cortical columns
| ignores key deep brain structures and is also almost irrelevant,
| the brain could use the proposed 'evolutionary' process with any
| architecture.
| mandibeet wrote:
| While it does build on established concepts like Hebbian
| learning, I think theory offers a potentially insightful way of
| thinking about brain function
| calepayson wrote:
| > I think this is over-simplified and possibly misunderstood.
|
| I'm with you here. I wrote this because I wanted to drive
| people towards the book. It's incredible and I did it little
| justice.
|
| > "cortical activity produces spatial patterns which somehow
| 'compete' and the 'winner' is chosen which is then reinforced
| through a 'reward'"
|
| A slight modification: spatio-temporal patterns*. Otherwise
| you're dead on.
|
| > 'Compete', 'winner', and 'reward' are all left undefined in
| the article.
|
| You're right. I left these undefined because I don't believe I
| have a firm understanding of how they work. Here's some
| speculation that might help clarify.
|
| Compete - The field of minicolumns is an environment. A spatio-
| temporal pattern "survives" when a minicolumn is firing in that
| pattern. It's "fit" if it's able to effectively spread to other
| minicolumns. Eventually, as different firing patterns spread
| across the surface area of the neocortex, a border will form
| between two distinct firing patterns. They "Compete" insofar as
| each firing pattern tries to "convert" minicolumns to fire in
| their specific pattern instead of another.
|
| Winner - This has two levels. First, an individual firing
| pattern could "win" the competition by spreading to a new
| minicolumn. Second, amalgamations of firing patterns, the
| overall firing pattern of a cortical column, could match
| reality better than others. This is a very hand-wavy answer,
| because I have no intuition for how this might happen. At a
| high level, the winning thought is likely the one that best
| matches perception. How this works seems like a bit of a
| paradox as these thoughts are perception. I suspect this is
| done through prediction. E.g. "If that person is my
| grandmother, she'll probably smile and call my name". Again,
| super hand-wavy, questions like this are why I posted this
| hoping to get in touch with people who have spent more time
| studying this.
|
| Reward - I'm an interested amateur when it comes to ML, and
| folks have been great about pointing out areas that I should go
| deeper. I have only a basic understanding of how reward
| functions work. I imagine the minicolumns as small neural
| networks and alluded to "reward" in the same sense. I have no
| idea what that reward algorithm is or if NNs are even a good
| analogy. Again, I really recommend the book if you're
| interested in a deeper explanation of this.
|
| > the theory is not new information and seems incredibly
| analogous to Hebbian learning which is a long-standing theory
| in neuroscience.
|
| I disagree with you here. Hebbian learning is very much a
| component of this theory, but not the whole. The last two
| constraints were inspired by it and, in hindsight, I should
| have been more explicit about that. But, Hebbian learning
| describes a tendency to average, "cells that fire together wire
| together". Please feel free to push back here but, the concept
| of Darwin Machines fits the constraints of Hebbian learning
| while still offering a seemingly valid description of how
| creative thought might occur. Something that, if I'm not
| misunderstanding, is undoubtedly new information.
|
| > I don't see any evidence that the brain will always produce
| several candidate activity patterns before judging a winner
| based on consensus.
|
| That's probably my fault in the retelling, check out the book:
| http://williamcalvin.com/bk9/index.htm
|
| I think if you read Chapters 1-4 (about 60 pages and with
| plenty of awesome diagrams) you'd have a sense for why Calvin
| believes this (whether you agree or not would be a fun
| conversation).
|
| > The tangent of cortical columns ignores key deep brain
| structures and is also almost irrelevant, the brain could use
| the proposed 'evolutionary' process with any architecture.
|
| I disagree here. A common mistake I think we to make is
| assuming evolution and natural selection are equivalent. Some
| examples of natural selection: A diversified portfolio, or a
| beach with large grains of sand due to some intricacy of the
| currents. Dawkinsian evolution is much much rarer. I can only
| think of three examples of architectures that have pulled it
| off. Genes, and their architecture, are one. Memes (imitated
| behavior) are another. Many animals imitate, but only one
| species has been able to build architecture to allow those
| behaviors to undergo an evolutionary process. Humans. And
| finally, if this theory is right, spatiotemporal patterns and
| the columnar architecture of the brain is the third.
|
| Ignoring Darwin Machines, there are only two architectures that
| have led to an evolutionary process. Saying we could use "any
| architecture" seems a bit optimistic.
|
| I appreciate the thoughtful response.
| gushogg-blake wrote:
| The image of the flattened out brain could use some
| illustrations, or more specific instructions on what we should be
| visualising.
|
| > First, if you look at a cross-section of the brain (eye-level
| with the table)
|
| I thought it was flat on the table? Surely if we look at it side-
| on we just see the edge?
|
| Without a clear idea of how to picture this, the other aspect
| (columns) doesn't make sense either.
| mandibeet wrote:
| I think in some ways by considering the brain as a Darwin
| Machine, we can explore new dimensions of how our minds work
| FrustratedMonky wrote:
| There is a lot of quibbling over details, but this is a 1-2 page
| high level elevator pitch, so will have some things glossed over.
| To that end, it seems like some interesting concepts for further
| exploration.
| specialist wrote:
| I read the followup:
|
| Lingua ex Machina: Reconciling Darwin and Chomsky with the Human
| [2000]
|
| https://www.amazon.com/Lingua-Machina-Reconciling-Darwin-Cho...
|
| Completely changed my worldview. Evolutionary processes every
| where.
|
| My (turrible) recollection:
|
| Darwinian processes for comprehending speech, the process of
| translating sounds into phenomes (?).
|
| There's something like a brain song, where a harmony signal
| echoes back and forth.
|
| Competition between and among hexagonal processing units (what
| Jeff Hawkins & Numenta are studying). My paraphrasing: meme PvP
| F4A battlefield where "winning" means converting your neighbor to
| your faction.
|
| Speculation about the human brain leaped from proto-language
| (noun-verb) to Chomsky language (recursively composable noun-
| verb-object predicates). Further speculation how that might be
| encoding in our brains.
|
| Etc.
| cs702 wrote:
| Big-picture, the idea is that different modalities of sensory
| data (visual, olfactory, etc.) are processed by different
| minicolumns in the brain, i.e., different subnetworks, each
| outputting a different firing pattern. These firing patterns
| propagate across the surface area of the brain, competing with
| conflicting messages. And then, to quote the OP, "after some
| period of time a winner is chosen, likely the message that
| controls the greatest surface area, the greatest number of
| minicolumns. When this happens, the winning minicolumns are
| rewarded, likely prompting them to encode a tendency for that
| firing pattern into their structure." And this happens in
| multiple layers of the brain.
|
| In other words, there's some kind of iterative mechanism for
| higher-level layers to find which lower-level subnetworks are
| most in agreement about the input data, inducing learning.
|
| Capsule-routing algorithms, proposed by Hinton and others, seek
| to implement precisely this idea, typically with some kind of
| expectation-maximization (EM) process.
|
| There are quite a few implementations available on github:
|
| https://github.com/topics/capsules
|
| https://github.com/topics/em-routing
|
| https://github.com/topics/routing-algorithm
| calepayson wrote:
| Great summary. Thanks for the links. These are awesome.
| abeppu wrote:
| I haven't heard of anyone talk about Hinton's capsule network
| concepts for some time. In 2017-18 it seemed exciting both
| because of Hinton but also because the pose/transformation
| sounded pretty reasonable. I don't know what would count as
| "explanation", but I'd be curious to hear any thoughts about
| why it seems they didn't really pan out. (Are there any tasks
| for which capsule methods are the best?)
| cs702 wrote:
| The short answer as to why capsule networks have "fallen out
| of fashion" is... _Transformers_.
|
| Transformers came out at roughly the same time[a] and have
| proven to be great at... pretty much everything. _They just
| work._ Since then, most AI research money, effort, and
| compute has been invested to study and improve Transformers
| and related models, at the expense of almost everything else.
|
| Many promising ideas, including routing, won't be seriously
| re-explored until and unless progress towards AGI seems to
| stall.
|
| ---
|
| [a] https://arxiv.org/abs/1706.03762
| abeppu wrote:
| I think this is a non-answer in some sense. Yes,
| transformers have been clearly very successful across a
| very wide range of tasks. But what _about_ the approach
| taken in capsules is comparatively deficient?
|
| Some kinds of explanations which I think are at least
| plausible (but IDK if any evidence exists for them):
|
| - The attention structure in transformers allows any chunk
| to be learned to be important for any other chunk. And
| pretty quickly people tended towards these being pretty
| deep. By comparison, the capsule + routing structure (IIUC)
| came with a built-in kind of sparsity (from capsules at a
| level in the hierarchy being independent), and because the
| hierarchy was meant to align with composition, it often (I
| think) didn't have a _huge_ number of levels? Maybe this
| flexibility + depth are key?
|
| - Related to capsules being independent, an initial design
| feature in capsule networks seems to have been smaller
| model sizes. Perhaps this was at some level just a bad
| thing to reach for? I think at the time, "smaller models
| means optimization searches over a smaller space, which is
| faster to converge and requires less data" was still sort
| of in people's heads, and I think this view is pretty much
| dead.
|
| - I've heard some people argue that one of the core
| strengths of transformers is that they support training in
| a way allows for maxing out available GPUs. I think this is
| mostly in comparison to previous language models which were
| explicitly sequential. But are capsule networks less
| conducive to efficient training?
| cs702 wrote:
| It's hard to make a fair comparison, because there hasn't
| been anywhere near as much money, effort, or compute
| invested in trying to scale up routing methods.
|
| Promising approaches are often ignored for reasons that
| have little to do with their merits. For example, Hinton,
| Bengio, and Lecun spent much of the 1990's and all of the
| 2000's in the fringes of academia, unable to get much
| funding, because few others were interested in or saw any
| promise in deep learning! Similarly, Katalin Kariko lost
| her job and spent almost two decades in obscurity because
| few others were interested or saw any promise in RNA
| vaccines!
|
| Now, I'm not saying routing methods will become more
| popular in the future. I mean, who the heck knows?
|
| What I'm saying is that promising approaches can fall out
| of favor for reasons that are not intrinsic to them.
| MAXPOOL wrote:
| If you take a birds eye view, fundamental breakthroughs don't
| happen that often. "Attention Is All You Need" paper also
| came out in 2017. It has now been 7 years without
| breakthrough at the same level as transformers. Breakthrough
| ideas can take decades before they are ready. There are many
| false starts and dead ends.
|
| Money and popularity are orthogonal to pathfinding that leads
| to breakthroughs.
| calepayson wrote:
| Well said
| superqd wrote:
| Nitpick: lots of text descriptions of visual patterns - this
| article could use at least 5 visual aid images.
| calepayson wrote:
| The book provides a ton. I'll write another version that
| follows the book more closely and uses them. Thanks for the
| feedback.
| osmarks wrote:
| I don't think this is true as stated. Evolutionary algorithms are
| not the most efficient way to do most things because they,
| handwavily, search randomly in all directions. Gradient descent
| and other gradient-based optimizers are way way faster where we
| can apply them: the brain probably can't do proper backprop for
| architectural reasons but I am confident it uses something much
| smarter than blind evolutionary search.
| cs702 wrote:
| The OP is _not_ about evolutionary algorithms in the usual
| sense (random mutation and selection over many generations).
|
| It's about mechanisms in the brain that plausibly evolved over
| time.
| osmarks wrote:
| > A Darwin Machine uses evolution to produce intelligence. It
| relies on the same insight that produced biology: That
| evolution is the best algorithm for predicting valid
| "solutions" within a near infinite problem space.
|
| It seems to be suggesting that neuron firing patterns (or
| something like that?) are selected by internal evolutionary
| processes.
| calepayson wrote:
| > Evolutionary algorithms are not the most efficient way to do
| most things because they, handwavily, search randomly in all
| directions.
|
| I think we agree and would love to dive a bit deeper with you
| here. My background is in biology and I'm very much an
| enthusiastic amateur when it comes to CS.
|
| When I first read about Darwin Machines, I looked up
| "evolutionary algorithms in AI", thought to myself "Oh hell ya,
| these CS folks are on it" and then was shocked to learn that
| "evolutionary algorithms" seemed to be based on an old school
| conception of evolution.
|
| First, evolution is on your team, it hates random search. In
| biology point mutations are the equivalent of random search,
| and organisms do everything in their power to minimize them.
|
| As I said in the article, If we were building a skyscraper and
| someone told us they wanted to place some bricks at random
| angles "so that we might accidentally stumble upon a better
| design" we would call them crazy. And rightfully so.
|
| Evolution still needs variation though, and it gets it through
| recombination. Recombination is when we take traits that we
| know work, and shuffle them to get something new. It provides
| much more variation with a much smaller chance of producing
| something that decreases fitness.
|
| It took me a while to grok how recombination produces anything
| novel, if we're shuffling existing traits how do we get a new
| trait? I still don't have a "silver-bullet" answer for this but
| I find that I usually visualize these concepts too far up the
| hierarchy. When I think of traits I think of eye color or hair
| color (and I suspect you do to). A trait is really just a
| protein (sometimes not even that) and those examples are the
| outliers where a single protein is responsible.
|
| It might be better to think of cancer suppression systems,
| which can be made up of thousands of proteins and pathways.
| They're like a large code base that proofreads. Imagine this
| code base has tons of different functions for different
| scenarios.
|
| Point mutations, what evolution hates, is like going into that
| code base and randomizing some individual characters. You're
| probably going to break the relevant function.
|
| Recombination, what evolution loves, is like going in and
| swapping two functions that take the same input, produce the
| same output, but are implemented differently. You can see how
| this blind shuffling might lead to improvements.
|
| How evolution creates new functions is a much more difficult
| topic. If you're interested, I recommend "The Selfish Gene".
| It's the best book I've ever read.
|
| >Gradient descent and other gradient-based optimizers are way
| way faster where we can apply them
|
| The second point is based on my (limited) understanding of non-
| biology things. Please point me in the right direction if you
| see me making a mistake.
|
| Gradient descent etc. are way faster when we can apply them.
| But I don't think we can apply them to these problems.
|
| My understanding of modern machine learning is that it can be
| creative in constrained environments. I hear move 37 is a great
| example but I don't know enough about go to feel any sort of
| way about it. My sense is: if you limit the problem space
| gradient decent can find creative solutions.
|
| But intelligence like you or I's operates in an unconstrained
| problem space. I don't think you can apply gradient descent
| because, how the heck could you possibly score a behavior?
|
| This is where evolution excels as an algorithm. It can take an
| infinite problem space and consistently come up with "valid"
| solutions to it.
|
| >the brain probably can't do proper backprop for architectural
| reasons but I am confident it uses something much smarter than
| blind evolutionary search.
|
| I think Darwin Machines might be able to explain "animal
| intelligence". But human intelligence is a whole other deal.
| There's some incredible research on it that is (as far as I can
| tell) largely undiscovered by AI engineers that I can share if
| you're interested.
___________________________________________________________________
(page generated 2024-07-17 23:09 UTC)