[HN Gopher] Darwin Machines
       ___________________________________________________________________
        
       Darwin Machines
        
       Author : calepayson
       Score  : 154 points
       Date   : 2024-07-16 22:54 UTC (1 days ago)
        
 (HTM) web link (vedgie.net)
 (TXT) w3m dump (vedgie.net)
        
       | calepayson wrote:
       | I'm obsessed with the idea of Darwin Machines (and I think you
       | should be too).
       | 
       | I've been tinkering with the idea in python but I just don't have
       | enough ML experience.
       | 
       | If you, or anyone you know, is interested in Darwin Machines
       | please reach out!
        
         | turtleyacht wrote:
         | Thank-you. Was wondering about your thoughts on emotions. Are
         | they merely byproducts of the chemical evolution of the brain,
         | or are they emergent artifacts of intelligence?
         | 
         | A system cannot feel, but we can map neurochemistry as
         | mechanistically as any other process. It would be interesting
         | to discover whether a "pure brain" exists, or whether even its
         | inputs, when considered in whole, colored its nature.
        
           | calepayson wrote:
           | Honestly, no idea.
           | 
           | I could imagine a Darwin Machine being a "pure brain", as you
           | put it. While we have emotions because our brains built a
           | pure brain atop an existing and messy infrastructure.
           | 
           | Or, emotions could just be the subjective experience of
           | thoughts competing.
           | 
           | Calvin goes deeper into things like this in the book, but I
           | suspect emotions help intelligence to some extent insofar as
           | they provide environmental change. It's good for the long
           | term health of an ecosystem to shake things up so that
           | nothing gets too stagnant.
        
           | IAmGraydon wrote:
           | Aren't emotions just the brain's way of experiencing the
           | pain/pleasure feedback system for non-physical stimuli?
        
         | DrMiaow wrote:
         | It is the focus of my long-term crazy side project.
         | https://youtu.be/sqvHjXfbI8o?si=7qwpc15Gn42mUnKQ&t=513
        
       | aldousd666 wrote:
       | This reminds me a little of Jeff Hawkins book, 1000 brain theory.
       | His company numenta has done this kind of research and they have
       | a mailing list. I'm not an expert but I've read Jeff's book and
       | noodled at the mailing list
        
         | calepayson wrote:
         | I ordered the book and I'm checking out the website rn. Looks
         | awesome. Thanks a ton for sharing!
        
         | dstrbtd_evolute wrote:
         | Hawkins' proposal is missing the key innovation that Calvin
         | proposes, which is that learning take place by evolutionary
         | means. But Hawkins' proposal does fit squarely within current
         | popular ideas around predictive coding.
         | 
         | The key structures in Hawkins' architecture are cortical
         | columns (CC). What his VP of Eng (Dileep George) did is to
         | analyze Hawkins' account of the functionality of a CC, and then
         | say that a CC is a module which must conform to a certain API,
         | and meet a certain contract. As long as a module obeys the API
         | and contract, we don't actually care how the CC module is
         | implemented. In particular, it's actually not important that
         | the module contain neurons. (Though the connections between the
         | CCs may still have to look like axons or whatever, I don't
         | know.)
         | 
         | Then Dileep George further figured out that there is an off the
         | shelf algorithm than works perfectly for the CC module. He
         | selected an algorithm which is based on statistical learning
         | theory (STL).
         | 
         | STL based algorithms are an excellent choice for the CCs,
         | IMNSHO. They are fast, theoretically sound, etc. They are also
         | understood in great mathematical detail, so we can characterize
         | _exactly_ what they can and can't do. So there is zero mystery
         | about the capabilities of his system at the CC level. Note that
         | in Hawkins' case, the STL based algorithms are used for pattern
         | recognition.
         | 
         | Now Hawkin's proposal isn't just a single CC, it's a network of
         | CC's all connected together in a certain way. My memory is a
         | little hazy at this point, but as best I recall, his
         | architecture should have no problem identifying sequences of
         | patterns (as for sound), or spatial patterns across time (as
         | for vision). And I bought his argument that these could be
         | combined hierarchically, and that the same structure could also
         | be used for playing back (outputting) a learned pattern, and
         | for recognizing cross modal patterns (that is, across sensory
         | modalities).
         | 
         | But is all this enough?
         | 
         | I say no. My reading of evolutionary epistemology suggests to
         | me that pattern identification is insufficient for making and
         | refuting conjectures in the most general sense. And ultimately,
         | a system must be able to create and refute conjectures to
         | create knowledge. Hawkins has a very weak story about
         | creativity and claims that it can all be done with pattern
         | recognition and analogy, but I am not convinced. It was the
         | weakest part of the book. (pp 183-193)
         | 
         | I don't know if it's clear to you or not why pattern
         | recognition is insufficient for doing general conjectures and
         | refutations. If it's not clear, I should attempt to expand on
         | that ...
         | 
         | The idea is, that it is not always possible to arrive at a
         | theory by just abstracting a pattern from a data set. For
         | example:
         | 
         | What set of data could Newton have looked at to conclude that
         | an object in motion stays in motion? I suppose he knew of 7
         | planets that stayed in motion, but then he had millions of
         | counter examples all around him. For a pattern recognition
         | algorithm, if you feed it a million + 7 data points, it will
         | conclude that objects in motion always come to a stop except
         | for a few weird exceptions which are probably noise.
        
           | calepayson wrote:
           | This is an awesome write up. I especially love the Newton
           | analogy. Thanks.
        
         | exe34 wrote:
         | my impression of hawkins from a distance is that he can
         | reproduce the success of the current orthodoxy, but is always a
         | few years behind sota.
        
       | mprime1 wrote:
       | FYI Evolutionary Algorithms have been an active area of research
       | for decades.[1]
       | 
       | Among the many uses, they have been applied to 'evolving' neural
       | networks.
       | 
       | Famously a guy whose name I can't remember used to generate
       | programs and mutations of programs.
       | 
       | My recommendation if you want to get into AI: avoid anything
       | written in the last 10 years and explore some classics from the
       | 70s
       | 
       | [1] https://en.m.wikipedia.org/wiki/Evolutionary_algorithm
        
         | fancy_pantser wrote:
         | Perhaps it was John Koza?
         | 
         | http://www.genetic-programming.com/johnkoza.html
        
         | EvanAnderson wrote:
         | I'm sure it's not who you're thinking of, but I can't miss an
         | opportunity to mention Tom Ray and Tierra:
         | https://tomray.me/tierra/whatis.html
        
         | blixt wrote:
         | A friend of mine made this in-browser neural network engine
         | that could run millions of multi-layer NNs in a simulated world
         | at hundreds of updates per second and each network could
         | reproduce and evolve. It worked in the sense that the networks
         | exhibited useful and varied behaviors. However, it was clear
         | that larger networks were needed for more complex behaviors and
         | evolution just starts to take a lot longer.
         | 
         | https://youtu.be/-1s3Re49jfE?si=_G8pEVFoSb2J4vgS
        
         | petargyurov wrote:
         | > avoid anything written in the last 10 years
         | 
         | Why?
        
           | exe34 wrote:
           | presumably because it's saturated with a monoculture, and the
           | hope (rightly or wrongly), some of the other roads might lead
           | to some alternative breakthrough.
        
         | PinkMilkshake wrote:
         | In the _Creatures_ artificial life  / virtual pet series, the
         | creatures have about 900 (maybe more in later versions) or so
         | neurons. Each neuron is a little virtual machine that is
         | designed in such a way that programs remain valid even with
         | random mutation.
        
         | mandibeet wrote:
         | Your recommendation to explore the classics is a good one. You
         | can gain a deeper appreciation by studying these foundational
         | works
        
         | JoeDaDude wrote:
         | There is the case of Blondie24, an evolutionary neural net, or
         | genetic algorithm, which was able to develop a very strong
         | checkers-playing capability by self-play with no human
         | instruction. It was later extended to paly other games.
         | 
         | https://en.wikipedia.org/wiki/Blondie24
        
         | northernman wrote:
         | I read this book:                    <https://books.google.ca/b
         | ooks/about/Artificial_Intelligence_Through_Simulate.html?id=QML
         | aAAAAMAAJ>
         | 
         | in 1972. It was published in 1966.
        
       | jaimie wrote:
       | The domain of Artificial Life is highly related and has had an
       | ongoing conference series and journal going, might be worth
       | mining for more inspiration:
       | 
       | https://en.wikipedia.org/wiki/Artificial_life
       | https://direct.mit.edu/artl https://alife.org
        
       | sdwr wrote:
       | Fantastic speculation here, explains a lot, and has testable
       | hypotheses.
       | 
       | For example, there should be a relationship between rate of
       | learning and the physical subcolumns - we should be able to
       | identify when a single column starts up / is fully trained / is
       | overused
       | 
       | Or use AI to try to mirror the learning process, creating an
       | external replica that makes the same decisions as the person
       | 
       | Marvin Minsky was spot on about the general idea 50 years ago,
       | seeing the brain as a collection of 1000s of atomic operators
       | (society of mind?)
        
         | calepayson wrote:
         | > Fantastic speculation here, explains a lot, and has testable
         | hypotheses.
         | 
         | Calvin is the man.
         | 
         | > For example, there should be a relationship between rate of
         | learning and the physical subcolumns - we should be able to
         | identify when a single column starts up / is fully trained / is
         | overused
         | 
         | This sounds super interesting. Could you break down what you're
         | thinking here?
         | 
         | > Marvin Minsky was spot on about the general idea 50 years
         | ago, seeing the brain as a collection of 1000s of atomic
         | operators (society of mind?)
         | 
         | I'm very much an amateur in this field and was under the
         | impression that Minsky was trying to break it up, but was
         | trying to specify each of those operations. What I find so
         | enticing about Neural Darwinism is the lack of specification
         | needed. Ideally, once you get the underlying process right,
         | there's a cascade of emergent properties.
         | 
         | Using the example of a murmuration of starlings I picture
         | Minsky trying to describe phase transitions between every
         | possible murmuration state. On the other hand I see Neural
         | Darwinism as an attempt to describe the behavior of a single
         | starling which can then be scaled to thousands.
         | 
         | Let me know if that's super wrong. I've only read second hand
         | descriptions of Minsky's ideas, so feel free to send some
         | homework my way.
        
           | breck wrote:
           | > I've only read second hand descriptions of Minsky's ideas,
           | so feel free to send some homework my way.
           | 
           | Here you go: https://breckyunits.com/marvin-minsky.html
           | 
           | I think you are right in that Minsky was missing some
           | important details in the branches of the tree, particularly
           | around cortical columns, but he was old when Hawkins and
           | Numenta released their stuff.
           | 
           | In terms of the root idea of the mind being a huge number of
           | concurrent agents, I think he was close to the bullseye and
           | it very much aligns with what you wrote.
        
             | calepayson wrote:
             | Awesome post, thanks. I ordered society of mind.
             | 
             | Reminds me of when I took "The Philosophy of Cognitive
             | Science" in college. The entire class was on AI. When I
             | asked the professor why, she explained: "You don't
             | understand something unless you can build it".
             | 
             | It's cool to learn that quote might have been because she's
             | a fan of Minsky.
             | 
             | > In terms of the root idea of the mind being a huge number
             | of concurrent agents, I think he was close to the bullseye
             | and it very much aligns with what you wrote.
             | 
             | I think you're right here and I'd like to add a bit. One
             | common mistake people make when thinking of evolution, is
             | where in the hierarchy it takes place. In other words, they
             | misidentify the agents by an order of magnitude.
             | 
             | For example, in biology I commonly see it taught that the
             | individual is the subject of natural selection (or worse,
             | the population).
             | 
             | Really, it's the gene. The beauty of evolution is that it
             | can take an agent as simple as the gene and shape it into
             | the litany of complex forms and functions we see all around
             | us.
             | 
             | If evolution is at play in the brain, I suspect that
             | Minsky's agents are the individual firing patterns. Like
             | genes, the base of the hierarchy, the fundamental unit.
             | Also like genes, they slowly build increasingly complex
             | behaviors from the ground up. Starting before birth and
             | continuing for most of our lives.
        
               | breck wrote:
               | Right, the Selfish Gene is one of the best books I ever
               | read.
               | 
               | There's also a paper I recently came across
               | (https://warpcast.com/breck/0xea2e1a35) which talks about
               | how causation is a two way street: low level nodes cause
               | things in higher level nodes, but higher level nodes in
               | turn cause things in lower level nodes.
               | 
               | In other words, just because genes have really been the
               | drivers and our bodies just the vehicles, doesn't mean
               | that's not cyclical (sometimes it could cycle to be the
               | higher level ideas driving the evolution in lower level
               | agents).
               | 
               | > I suspect that Minsky's agents are the individual
               | firing patterns.
               | 
               | I like this idea. The biggest open question in my mind in
               | regards to Minsky still is exactly on this: what
               | physically is an agent? How many are there? My margin of
               | error here is wild -- like 10 orders of magnitude.
        
         | jcynix wrote:
         | Regarding Minsky: the most interesting thoughts I read about
         | theories of a mind, are his books, namely: _The Society of
         | Mind_ and _The Emotion Machine_ which should be more widely
         | known.
         | 
         | More of Minsky's ideas on "Matter, Mind, and Models" are
         | mentioned here:
         | https://www.newyorker.com/magazine/1981/12/14/a-i
         | 
         | And let's not forget Daniel Dennett: In "Consciousness
         | Explained," a 1991 best-seller, he described consciousness as
         | something like the product of multiple, layered computer
         | programs running on the hardware of the brain. [...]
         | 
         | Quoted from
         | https://www.newyorker.com/magazine/2017/03/27/daniel-dennett...
        
       | breck wrote:
       | I took more notes on this blog post than anything else I've read
       | this month.
        
         | calepayson wrote:
         | Man, this has me grinning like an idiot. Thanks.
        
       | jekude wrote:
       | I've been noodling on how to combine neural networks with
       | evolution for a while. I've always thought that to do this, you
       | need some sort of evolvable genetic/functional units, and have
       | had trouble fitting traditional artificial neurons w backprop
       | into that picture.
       | 
       | My current rabbit hole is using Combinatory Logic as the genetic
       | material, and have been trying to evolve combinators, etc (there
       | is some active research in this area).
       | 
       | Only slightly related to the author's idea, its cool that others
       | are interested in this space as well.
        
         | pyinstallwoes wrote:
         | Thermodynamic annealing over a density parameter space
        
         | Matumio wrote:
         | Then probably you know about NEAT (the genetic algorithm) by
         | now. I'm not sure what has been tried in directly using
         | combinatorical logic instead of NNs (do Hopfield networks
         | count?), any references?
         | 
         | I've tried to learn simple look-up tables (like, 9 bits of
         | input) using the Cross-Entropy method (CEM), this worked well.
         | But it was a very small search space (way too large to just try
         | all solutions, but still, a tiny model). I haven't seen the CEM
         | used on larger problems. Though there is a cool paper about
         | learning tetris using the cross-entropy method, using a bit of
         | feature engineering.
        
           | daveguy wrote:
           | I am familiar with NEAT, it was very exciting when it came
           | out. But, NEAT does not use back propagation or single
           | network training at all. The genetic algorithm combines
           | static neural networks in an ingenious way.
           | 
           | Several years prior, in undergrad, I talked to a professor
           | about evolving network architectures with GA. He scoffed that
           | squishing two "mediocre" techniques together wouldn't make a
           | better algorithm. I still think he was wrong. Should have
           | sent him that paper.
           | 
           | IIRC NEAT wasn't SOTA when it came out, but it is still a
           | fascinating and effective way to evolve NN architecture using
           | genetic algorithms.
           | 
           | If OP (or anyone in ML) hasn't studied it, they should.
           | 
           | https://en.m.wikipedia.org/wiki/Neuroevolution_of_augmenting.
           | .. (and check the bibliography for the papers)
           | 
           | Edit: looking at the continuation of NEAT it looks like they
           | focused on control systems, which makes sense. The evolved
           | network structures are relatively simple.
        
         | peheje wrote:
         | Maybe a key innovation would be to apply backpropagation to
         | optimize the crossover process itself. Instead of random
         | crossover, compute the gradient of the crossover operation.
         | 
         | For each potential combination, "learn" (via normal backprop)
         | how different ways of crossover impacts on overall network
         | performance. Then use this to guide the selection of optimal
         | crossover points and methods.
         | 
         | This "gradient-optimized crossover" would be a search process
         | in itself, aiming to find the best way to combine specific
         | parts of networks to maximize improvement of the whole. It
         | could make "leaps", instead of small incremental steps, due to
         | the exploratory genetic algorithm.
         | 
         | Has anything like this been tried?
        
       | paraschopra wrote:
       | The book "Cerebral Code" is made available for free by the author
       | on his website: http://williamcalvin.com/bk9/
       | 
       | For a more modern treatment on the subject, read this paper: An
       | Attempt at a Unified Theory of the Neocortical Microcircuit in
       | Sensory Cortex
       | https://www.researchgate.net/publication/343269087_An_Attemp...
        
       | ViscountPenguin wrote:
       | This strongly reminds me of the algorithm used by swarming
       | honeybees (if anyone's interested I'd highly recommend reading
       | honeybee democracy). I reckon there's something to this.
       | 
       | I might have a go implementing something along these lines.
        
       | DrMiaow wrote:
       | This project employs a Darwinian approach. Initially, it was an
       | experiment in traditional program and user interface generation
       | that incorporated evolutionary feedback into the mutation
       | process. A combination of PG and AL. It has achieved some success
       | with small programs and is now exploring the potential
       | combination of LLMs
       | 
       | https://youtu.be/sqvHjXfbI8o?si=7qwpc15Gn42mUnKQ&t=513
        
       | visarga wrote:
       | I don't think it matters so much how the brain is made, what
       | matters is the training data. And we obtain data by searching.
       | Search is a great concept, it covers evolution, intelligence and
       | creativity, it's also social. Search is discrete, recursive,
       | combinatorial and based on some kind of language (DNA, or words,
       | or just math/code).
       | 
       | Searching the environment provides the data brain is trained on.
       | I don't believe we can understand the brain in isolation without
       | its data engine and the problem space where it develops.
       | 
       | Neural nets showed that given a dataset, you can obtain similar
       | results with very different architectures, like transformer and
       | diffusion models, or transformer vs Mamba. The essential
       | ingredient is data, architecture only needs to pass some minimal
       | bar for learning.
       | 
       | Studying just the brain misses the essential - we are search
       | processes, the whole life is search for optimal actions, and
       | evolution itself is search for environment fitness. These search
       | processes made us what we are.
        
         | advael wrote:
         | What in the world
         | 
         | Most "diffusion models" use similar VAE to transformer backbone
         | architectures. Diffusion isn't an architecture, it's a problem
         | framing
         | 
         | As for the rest of this, I'm torn between liking the poetry of
         | it and pointing out that this is kind of that thing where you
         | say something like it's supposed to be a mind-blowing insight
         | when it's well-known and pretty obvious. Most people familiar
         | with learning theory already understand learning algorithms of
         | any kind as a subset of probabilistic search algorithms with
         | properties that make them responsive to data. The idea that the
         | structure of the information processing system doesn't matter
         | and there's just this general factor of learning capacity a
         | thing has is... not well supported by the way in which research
         | has progressed in the entire period of time when this has been
         | relevant to most people? Sure, in theory any neural network is
         | a general function approximator and could in theory learn any
         | function it's complex enough to represent. Also, we can arrive
         | at the solution to any computable problem by representing it as
         | a number and guessing random numbers until we can verify a
         | solution. Learning algorithms can almost be defined as attempts
         | to do better search via structured empiricism than can be done
         | with the assumption that structure doesn't matter. Like,
         | sometimes multiple things work, sure. That doesn't mean it's
         | arbitrary
         | 
         | TL;DR: Of course learning is a kind of search, but discovering
         | structures that are good at learning is the whole game
        
           | Xcelerate wrote:
           | Yeah, I really don't understand this recently popular
           | viewpoint that the algorithm doesn't matter, just how much
           | data you throw at it. It doesn't seem to be based on anything
           | more than wishful thinking.
           | 
           | One can apply Hutter search to solve just about any problem
           | conceivable given the data and guess what--you'll approach
           | the optimal solution! The only downside is that this process
           | will take more time than available in our physical universe.
           | 
           | I think people forget the time factor and how the entire
           | field of computational complexity theory arose because the
           | meta problem is not that we can't solve the problem--it's
           | that we can't solve it quickly enough on a timescale that
           | matters to humans.
           | 
           | Current NN architectures are missing something very
           | fundamental related to the efficiency of problem solving, and
           | I really don't see how throwing more data at them is going to
           | magically convert an EXPTIME algorithm into a PTIME one. (I'm
           | not saying NNs are EXPTIME; I'm saying that they are
           | incapable of solving entire classes of problems that have
           | both PTIME and EXPTIME solutions, as the NN architecture is
           | not able to "discover" PTIME solutions, thus rendering them
           | incapable of solving those classes of problems in any
           | practical sense).
        
             | advael wrote:
             | Also, one of the major classes of problem that gets solved
             | and we view as "progress" in machine learning is framing
             | problems. Like we couldn't define "draw a good picture" in
             | a way we could actually assess well, GANs and Diffusion
             | turn out to be good ways to phrase problems like that. In
             | the former case, it creates a way to _define_ the problem
             | as  "make something indistinguishable from an example
             | pulled from this dataset" and in the latter case, "I've
             | randomized some of these pixels, undo that based on the
             | description"
             | 
             | The idea of "efficiency" and "progress" is this post-hoc
             | rationalization that people who never understood the
             | problem, pay people to understand the problem, apply to
             | problems once they have a working solution in hand. It's a
             | model that is inherently as dumb as a model can be, and the
             | assumption it makes is that there is some general factor of
             | progress on hard problems that can be dialed up and down.
             | Sure, you can pay more scientists and probabilistically
             | increase the rate at which problems are solved, but you
             | can't predict how long it will take, how much failure it
             | will involve, whether a particular scientist will solve a
             | particular problem at all, whether that problem is even
             | solvable in principle sometimes. Businesspeople and
             | investors like models where you put in money and you get
             | out growth at a predictable rate with a controllable
             | timescale, and if this doesn't work you just kick it
             | harder, and this ill fits most frontier research. Hell, it
             | ill suits a lot of regular work.
        
               | visarga wrote:
               | > Sure, you can pay more scientists and probabilistically
               | increase the rate at which problems are solved, but you
               | can't predict how long it will take, how much failure it
               | will involve, whether a particular scientist will solve a
               | particular problem at all, whether that problem is even
               | solvable in principle sometimes.
               | 
               | Fully agree, search is hard, unpredictable and expensive.
               | Also a matter of luck, being at the right place and time,
               | and observing something novel. That is why I put the
               | emphasis of AI doing search, not just imitating humans.
        
               | advael wrote:
               | Okay, but what does that mean? AI is a search process. Do
               | you mean you want the AI to formulate queries? Test
               | hypotheses? Okay. How? What does that mean? What we know
               | how to do is to mathematically define objective functions
               | and tune against them. What objective function describes
               | the behavior you want? Is there some structural paradigm
               | we can use for this other than tuning the parameters on a
               | neural network through optimization toward an objective
               | function? If so, what is it?
               | 
               | I'm sorry to be a little testy but what you've basically
               | said is "We should go solve the hard problems in AI
               | research". Dope. As an AI researcher I fully agree.
               | Thanks. Am I supposed to clap or something?
        
             | visarga wrote:
             | Not "throwing more data at them" but having the AI discover
             | things by searching. AI needs to contribute to the search
             | process to graduate the parrot label.
        
           | visarga wrote:
           | > Of course learning is a kind of search, but discovering
           | structures that are good at learning is the whole game
           | 
           | No, you missed the essential. I mentioned search in the
           | context of discovery, or in other words expanding knowledge.
           | 
           | Training neural nets is also a search for the best parameters
           | that fit the data, but it's secondary. Many architectures
           | work, there have been a thousand variations for the
           | transformer architectures and plenty of RNN-like approaches
           | since 2017 when transformer was invented, and none of them is
           | better than the current one or significantly worse.
           | 
           | Also, considering human population, the number of neurons in
           | the brain, synapses and wiring are very different at micro
           | level from person to person, yet we all learn. The difference
           | between the top 5% and bottom 5% humans is small compared
           | with other species, for example. What makes a big difference
           | between people is education, in other words experiences, or
           | training data.
           | 
           | To return to the original idea - AI that simply learns to
           | imitate human text is capable only of remixing ideas. But an
           | AI that actively explores can discover novel ideas, like
           | AlphaZero and AlphaTensor. In both these cases search played
           | a major role.
           | 
           | So I was generalizing the concept of "search" across many
           | levels of optimization, from protein folding to DNA and human
           | intelligence. Search is essential for progress across the
           | stack. Even network architecture evolves by search - with
           | human researchers.
        
         | calepayson wrote:
         | >I don't think it matters so much how the brain is made, what
         | matters is the training data.
         | 
         | I agree that training data is hugely important but I think it
         | does matter how the brain is made. Structures in the brain are
         | remarkably well preserved between species. Despite the fact
         | that evolution loves to try different methods, if it can get
         | away with it.
         | 
         | > Searching the environment provides the data brain is trained
         | on. I don't believe we can understand the brain in isolation
         | without its data engine and the problem space where it
         | develops.
         | 
         | I completely agree and suspect we might be on the same page.
         | What I find most compelling about the idea of Darwin Machines
         | is the fact that it relies on evolution. In my opinion, true
         | Dawkinsian evolution, is the most efficient search algorithm.
         | 
         | I'd love to hear you go deeper on what you mean by data engine
         | and problem space. To (possibly) abuse those terms, I think
         | evolution is the data engine. The problem space is fun and I
         | love David Eagleman's description of the brain as sitting in a
         | warm bath in a dark room trying to figure out what to do with
         | all these electric shocks.
         | 
         | > Neural nets showed that given a dataset, you can obtain
         | similar results with very different architectures, like
         | transformer and diffusion models, or transformer vs Mamba. The
         | essential ingredient is data, architecture only needs to pass
         | some minimal bar for learning.
         | 
         | My understanding of neural nets, and please correct me if I'm
         | wrong, is that they solve system-one thinking, intuition. As of
         | yet, they haven't been able to do much more than produce an
         | average of their training data (which is incredible). With a
         | brute force approach they can innovate in constrained
         | environments, e.g. move 37 (or so I'm told, I haven't played go
         | :)). I haven't seen evidence that they might be able to
         | innovate in open-ended environments. In other words, there's no
         | suggestion they can do system-two thinking where time spent on
         | a problem correlates with the quality of the answer.
         | 
         | > Studying just the brain misses the essential - we are search
         | processes, the whole life is search for optimal actions, and
         | evolution itself is search for environment fitness.
         | 
         | I completely agree. I even suspect that, in a few years, we'll
         | see "life" and "intelligence" as synonymous concepts, just
         | implemented in different mediums. At the same time, studying
         | those mediums can be a blast.
        
       | nikolayasdf123 wrote:
       | > This layering forms the dominant narrative of how intelligence
       | may work and is the basis for deep neural nets. The idea is,
       | stimulus is piped into the "top" layer and filters down to the
       | bottom layer, with each layer picking up on more and more
       | abstract concepts.
       | 
       | popular deep artificial neural networks (lstms, llms, etc.) are
       | highly recurrent, in which they are simulating not deep networks,
       | but shallow networks that process information in loops many
       | times.
       | 
       | > columns.. and that's about it.
       | 
       | recommend not to oversimplify structure here. what you describing
       | is only high-level structure of single part of brain (neocortex).
       | 
       | 1. brain has many other structures inside basal ganglia,
       | cerebellum, midbrain, etc. each with different characteristic
       | micro-circuits.
       | 
       | 2. brain networks are highly interconnected on long range.
       | neurons project (as in send signals) to very distant parts of the
       | brain. similarly they get projections from other distant parts of
       | brain too.
       | 
       | 3. temporal dimension is important. your article is very ML-like
       | focusing on information processing devoid of temporal dimension.
       | if you want to draw parallels to real neurons in brain, need to
       | explain how it fits into temporal dynamics (oscillations in
       | neurons and circuits).
       | 
       | 4. is this competition in realm of abeyant (what you can think in
       | principle) or current (what you think now) representations?
       | what's the timescales and neurological basis for this?
       | 
       | overall, my take it is a bit ML-like talk. if it describes real
       | neurological networks it got to be closer and stronger
       | neurological footing.
       | 
       | here is some good material, if you want to dive into
       | neuroscience. "Principles of Neurobiology", Liqun Luo, 2020 and
       | "Fundamental Neuroscience", McGraw Hill.
       | 
       | more resources can be found here:
       | 
       | http://neuroscience-landscape.com/
        
         | calepayson wrote:
         | > popular deep artificial neural networks (lstms, llms, etc.)
         | are highly recurrent, in which they are simulating not deep
         | networks, but shallow networks that process information in
         | loops many times.
         | 
         | Thanks for the info. Is there anything you would recommend to
         | dive deeper into this? Books/papers/courses/etc.
         | 
         | > recommend not to oversimplify structure here. what you
         | describing is only high-level structure of single part of brain
         | (neocortex).
         | 
         | Nice suggestion. I added a bit to make it clear that I'm
         | talking about the neocortex.
         | 
         | > 1 & 2
         | 
         | Totally. I don't think AI is a simple as building a Darwin
         | Machine, much like it's not as simple as building a neural net.
         | But I think the concept of a Darwin Machine is an interesting,
         | and possibly important, component.
         | 
         | My goal with this post was to introduce folks who hadn't heard
         | of this concept and, hopefully, get in contact with folks who
         | had. I left out the other so I could try to focus on what
         | matters.
         | 
         | > temporal dimension is important. your article is very ML-like
         | focusing on information processing devoid of temporal
         | dimension. if you want to draw parallels to real neurons in
         | brain, need to explain how it fits into temporal dynamics
         | (oscillations in neurons and circuits).
         | 
         | Correct me if I misunderstand, but I believe I did. The spatio-
         | temporal firing patterns of minicolumns contain the temporal
         | dimension. I touched on the song analogy but we can go deeper
         | here.
         | 
         | Let's imagine the firing pattern of a minicolumn as a melody
         | that fits within the period of some internal clock (I doubt
         | there's actually a clock but I think it's a useful analogy).
         | Each minicolumn starts "singing" its melody over and over, in
         | time with the clock. Each clock cycle, every minicolumn is
         | influenced by its neighbors within the network and they begin
         | to sync up. Eventually they're all harmonizing to the same
         | melody.
         | 
         | A network might propagate a bunch of different melodies at
         | once. When they meet, the melodies "compete". Each tries to
         | propagate to a new minicolumn and fitness is judged by other
         | inputs to that minicolumn (think sensory) and the tendencies of
         | that minicolumn (think memory).
         | 
         | I think the evolution is an incredible algorithm is because it
         | relies as much as it does on time.
         | 
         | > is this competition in realm of abeyant (what you can think
         | in principle) or current (what you think now) representations?
         | what's the timescales and neurological basis for this?
         | 
         | I'm not familiar with these ideas but let me give it a shot.
         | Feel free to jump in with more questions to help clarify.
         | 
         | Neural Darwinism points to structures - minicolumns, cortical
         | columns, and interesting features of their connections - and
         | describes one possibility for how those structures might lead
         | to thought. In your words, I think the structures are the realm
         | of abeyant representations while the theory describes current
         | representations.
         | 
         | The neurological basis for this, the description of the abeyant
         | representation (hope I'm getting that right), is Calvin's
         | observations of the structure of the brain. Observations based
         | on his and other's research.
         | 
         | To a large extent, neuroscience doesn't have a great through-
         | line-story of how the brain works. For example the idea of
         | regions of the brain responsible for specific functions - like
         | the hippocampus for memory - doesn't exactly play nice with
         | Karl Lashley's experimental work on memory.
         | 
         | What I liked most about this book is how Calvin tried to relate
         | his theory to both structure and experimental results.
         | 
         | > overall, my take it is a bit ML-like talk. if it describes
         | real neurological networks it got to be closer and stronger
         | neurological footing.
         | 
         | If, by ML-like talk, you mean a bit woo-woo and hand wavy. Ya,
         | I agree. Ideally I'd be a better writer. But I'm not, so I
         | highly recommend the book.
         | 
         | It's written by an incredible neuroscientist and, so far, none
         | of the neuroscience researchers I've given it to have expressed
         | anything other than excitement about it. And I explicitly told
         | them to keep an eye out for places they might disagree. One of
         | them is currently reading it a second time right now with the
         | goal verifying everything. If it all checks out, he plans on
         | presenting the ideas to his lab. I'll update the post if he, or
         | anyone in his lab, finds something that doesn't check out.
         | 
         | > here is some good material, if you want to dive into
         | neuroscience. "Principles of Neurobiology", Liqun Luo, 2020 and
         | "Fundamental Neuroscience", McGraw Hill.
         | 
         | Why these two textbooks? I got my B.S. in neuroscience so I
         | feel good about the foundations. Happy to check these out if
         | you believe they add something that many other textbooks are
         | missing.
        
       | lachlan_gray wrote:
       | Oh dude this is so cool. I think you're dead right.
       | 
       | If you'll pardon some woo, another argument I see in favour of
       | message passing/consensus, is that it "fits" the self similar
       | nature of life patterns.
       | 
       | Valid behaviours that replicate and persist, for only the reason
       | that they do.
       | 
       | Culture, religion, politics, pop songs, memes... "Egregore" comes
       | to mind. In some ways "recombination" could be seen as
       | "cooperation", even at the level of minicolumns.
       | 
       | (Edit: what I mean to say is that it kinda makes sense that the
       | group dynamics between constituent units of one brain would be
       | similar in some way to the group dynamics you get from a bunch of
       | brains)
        
       | pshc wrote:
       | > _These connections result in a triangular array of connected
       | minicolumns with large gaps of unconnected minicolumns in
       | between. Well, not really unconnected, each of these are
       | connected to their own triangular array._
       | 
       | > _Looking down on the brain again, we can imagine projecting a
       | pattern of equilateral triangles - like a fishing net - over the
       | surface. Each vertex in the net will land on a minicolumn within
       | the same network, leaving holes over minicolumns that don 't
       | belong to that network. If we were to project nets over the
       | network until every minicolumn was covered by a vertex we would
       | project 50-100 nets._
       | 
       | Around this part I had a difficult time visualizing the intent
       | here. Are there any accompanying diagrams or texts? Thanks for
       | the interesting read!
        
         | calepayson wrote:
         | http://williamcalvin.com/bk9/index.htm
         | 
         | I'd recommend just banging out chapters 1-4 of the book (~60
         | pages). Lot's of diagrams and I think you'll get the meat of
         | the idea.
         | 
         | Thanks for the feedback!
        
       | slow_typist wrote:
       | The title of the referenced book by Erwin Schrodinger is "what is
       | life", I believe.
       | 
       | https://archive.org/details/whatislife0000erwi
        
         | calepayson wrote:
         | Thanks for pointing this out. I'll change it when I'm at my
         | computer.
        
       | auraai wrote:
       | There's lots of room for cross-pollination between bio/life
       | sciences and ML/AI. One key insight is the importance of what you
       | pick as your primary representation of data (is everything a
       | number, a symbolic structure, a probability distribution, etc). I
       | believe a lot of these bio-inspired approaches over-emphasize the
       | embodied nature of intelligence and how much it needs to be
       | situated in space and time, which downplays all the sub-problems
       | that need to be solved in other "spaces" with less obvious
       | "spatiotemporal" structure. I believe space and time are
       | emergent, at least for the purposes of defining intelligence, and
       | there are representations where both space and time arise as
       | dimensions of their structure and evolution.
        
       | wdwvt1 wrote:
       | This post analogizes between a specific theory of human
       | intelligence and a badly caricatured theory of evolution. It
       | feels like better versions of the arguments for Darwin Machines
       | exist that would not: a) require an unsupportable neuron-centric
       | view of evolution, and b) don't view evolution through the
       | programmers lens.
       | 
       | > Essentially, biology uses evolution because it is the best way
       | to solve the problem of prediction (survival/reproduction) in a
       | complex world.
       | 
       | 1. This is anthropocentric in a way that meaningfully distorts
       | the conclusion. The vast majority of life on earth, whether you
       | count by raw number, number of species, weight, etc. do not have
       | neurons. These organisms are of course, microbes (viruses and
       | prokaryotes) and plants. Bacteria and viruses do not 'predict' in
       | the way this post speaks of. Survival strategies that bacteria
       | use (that we know about and understand) are hedging-based. For
       | example, some portion of a population will stochastically switch
       | certain survival genes on (e.g. sporulation, certain efflux pumps
       | = antibiotic resistance genes) that have a cost benefit ratio
       | that changes depending on the condition. This could be construed
       | as a prediction in some sense: the genome that has enough
       | plasticity to allow certain changes like this will, on average,
       | produce copies in a large enough population that enable survival
       | through a tremendous range of conditions. But that's a very
       | different type of prediction than what the rest of the post talks
       | about. In short, prediction is something neurons are good at, but
       | it's not clear it's a 'favored' outcome in our biosphere.
       | 
       | > It relies on the same insight that produced biology: That
       | evolution is the best algorithm for predicting valid "solutions"
       | within a near infinite problem space.
       | 
       | 2. This gets the teleology reversed. Biology doesn't use
       | anything, it's not trying to solve anything, and evolution isn't
       | an algorithm because it doesn't have an end goal or a teleology
       | (and it's not predicting anything). Evolution is what you observe
       | over time in a population of organisms that reproduce without
       | perfect fidelity copy mechanisms. All we need say is that things
       | that reproduce are more likely to be observed. We don't have to
       | anthropomorphize the evolutionary process to get an explanation
       | of the distribution of reproducing entities that we observe or
       | the fact that they solve challenges to reproduction.
       | 
       | > Many people believe that, in biology, point mutations lead to
       | the change necessary to drive novelty in evolution. This is
       | rarely the case. Point mutations are usually disastrous and every
       | organism I know of does everything in its power to minimize them.
       | Think, for every one beneficial point mutation, there are
       | thousands that don't have any effect, and hundreds that cause
       | something awful like cancer. If you're building a skyscraper,
       | having one in a hundred bricks be laid with some variation is not
       | a good thing. Instead Biology relies on recombination. Swap one
       | beneficial trait for another and there's a much smaller chance
       | you'll end up with something harmful and a much higher chance
       | you'll end up with something useful. Recombination is the key to
       | the creativity of evolution, and Darwin Machines harness it.
       | 
       | 3. An anthropocentric reading of evidence that distorts the
       | conclusion. The fidelity (number of errors per cycle per base
       | pair) of the polymerases varies by maybe 8 orders of magnitude
       | across the tree of life. For a review, see figure 3 in ref [1]. I
       | don't know about Darwin Machines, but the view that
       | 'recombination' is the key to evolution is a conclusion you would
       | draw if you examined only a part of the tree of life. We can
       | quibble about viruses being alive or not, but they are certainly
       | the most abundant reproducing thing on earth by orders of
       | magnitude. Recombination doesn't seem as important for adaptation
       | in them as it does in organisms with chromosomes.
       | 
       | 4. There are arguments that seem interesting (and maybe not
       | incompatible with some version of the post) that life should be
       | abundant because it actually helps dissipate energy gradients.
       | See the Quanta article on this [0].
       | 
       | [0] https://www.quantamagazine.org/a-new-thermodynamics-
       | theory-o... [1] Sniegowski, P. D., Gerrish, P. J., Johnson, T., &
       | Shaver, A. (2000). The evolution of mutation rates: separating
       | causes from consequences. BioEssays, 22(12), 1057-1066.
       | doi:10.1002/1521-1878(200012)22:12<1057::aid-bies3>3.0.co;2-w
        
       | fedeb95 wrote:
       | isn't this the same as genetic algorithms ?
        
       | nirvael wrote:
       | I think this is over-simplified and possibly misunderstood. I
       | haven't read the book this article references but if I am
       | understanding the main proposal correctly then it can be
       | summarised as "cortical activity produces spatial patterns which
       | somehow 'compete' and the 'winner' is chosen which is then
       | reinforced through a 'reward'".
       | 
       | 'Compete', 'winner', and 'reward' are all left undefined in the
       | article. Even given that, the theory is not new information and
       | seems incredibly analogous to Hebbian learning which is a long-
       | standing theory in neuroscience. Additionally, the metaphor of
       | evolution within the brain does not seem apt. Essentially what is
       | said is that given a sensory input, we will see patterns emerge
       | that correspond to a behaviour deemed successful. Other brain
       | patterns may arise but are ignored or not reinforced by a reward.
       | This is almost tautological, and the 'evolutionary process'
       | (input -> brain activity -> behaviour -> reward) lacks
       | explanatory power. This is exactly what we would expect to see.
       | If we observe a behaviour that has been reinforced in some way,
       | it would obviously correlate with the brain producing a specific
       | activity pattern. I don't see any evidence that the brain will
       | always produce several candidate activity patterns before judging
       | a winner based on consensus. The tangent of cortical columns
       | ignores key deep brain structures and is also almost irrelevant,
       | the brain could use the proposed 'evolutionary' process with any
       | architecture.
        
         | mandibeet wrote:
         | While it does build on established concepts like Hebbian
         | learning, I think theory offers a potentially insightful way of
         | thinking about brain function
        
         | calepayson wrote:
         | > I think this is over-simplified and possibly misunderstood.
         | 
         | I'm with you here. I wrote this because I wanted to drive
         | people towards the book. It's incredible and I did it little
         | justice.
         | 
         | > "cortical activity produces spatial patterns which somehow
         | 'compete' and the 'winner' is chosen which is then reinforced
         | through a 'reward'"
         | 
         | A slight modification: spatio-temporal patterns*. Otherwise
         | you're dead on.
         | 
         | > 'Compete', 'winner', and 'reward' are all left undefined in
         | the article.
         | 
         | You're right. I left these undefined because I don't believe I
         | have a firm understanding of how they work. Here's some
         | speculation that might help clarify.
         | 
         | Compete - The field of minicolumns is an environment. A spatio-
         | temporal pattern "survives" when a minicolumn is firing in that
         | pattern. It's "fit" if it's able to effectively spread to other
         | minicolumns. Eventually, as different firing patterns spread
         | across the surface area of the neocortex, a border will form
         | between two distinct firing patterns. They "Compete" insofar as
         | each firing pattern tries to "convert" minicolumns to fire in
         | their specific pattern instead of another.
         | 
         | Winner - This has two levels. First, an individual firing
         | pattern could "win" the competition by spreading to a new
         | minicolumn. Second, amalgamations of firing patterns, the
         | overall firing pattern of a cortical column, could match
         | reality better than others. This is a very hand-wavy answer,
         | because I have no intuition for how this might happen. At a
         | high level, the winning thought is likely the one that best
         | matches perception. How this works seems like a bit of a
         | paradox as these thoughts are perception. I suspect this is
         | done through prediction. E.g. "If that person is my
         | grandmother, she'll probably smile and call my name". Again,
         | super hand-wavy, questions like this are why I posted this
         | hoping to get in touch with people who have spent more time
         | studying this.
         | 
         | Reward - I'm an interested amateur when it comes to ML, and
         | folks have been great about pointing out areas that I should go
         | deeper. I have only a basic understanding of how reward
         | functions work. I imagine the minicolumns as small neural
         | networks and alluded to "reward" in the same sense. I have no
         | idea what that reward algorithm is or if NNs are even a good
         | analogy. Again, I really recommend the book if you're
         | interested in a deeper explanation of this.
         | 
         | > the theory is not new information and seems incredibly
         | analogous to Hebbian learning which is a long-standing theory
         | in neuroscience.
         | 
         | I disagree with you here. Hebbian learning is very much a
         | component of this theory, but not the whole. The last two
         | constraints were inspired by it and, in hindsight, I should
         | have been more explicit about that. But, Hebbian learning
         | describes a tendency to average, "cells that fire together wire
         | together". Please feel free to push back here but, the concept
         | of Darwin Machines fits the constraints of Hebbian learning
         | while still offering a seemingly valid description of how
         | creative thought might occur. Something that, if I'm not
         | misunderstanding, is undoubtedly new information.
         | 
         | > I don't see any evidence that the brain will always produce
         | several candidate activity patterns before judging a winner
         | based on consensus.
         | 
         | That's probably my fault in the retelling, check out the book:
         | http://williamcalvin.com/bk9/index.htm
         | 
         | I think if you read Chapters 1-4 (about 60 pages and with
         | plenty of awesome diagrams) you'd have a sense for why Calvin
         | believes this (whether you agree or not would be a fun
         | conversation).
         | 
         | > The tangent of cortical columns ignores key deep brain
         | structures and is also almost irrelevant, the brain could use
         | the proposed 'evolutionary' process with any architecture.
         | 
         | I disagree here. A common mistake I think we to make is
         | assuming evolution and natural selection are equivalent. Some
         | examples of natural selection: A diversified portfolio, or a
         | beach with large grains of sand due to some intricacy of the
         | currents. Dawkinsian evolution is much much rarer. I can only
         | think of three examples of architectures that have pulled it
         | off. Genes, and their architecture, are one. Memes (imitated
         | behavior) are another. Many animals imitate, but only one
         | species has been able to build architecture to allow those
         | behaviors to undergo an evolutionary process. Humans. And
         | finally, if this theory is right, spatiotemporal patterns and
         | the columnar architecture of the brain is the third.
         | 
         | Ignoring Darwin Machines, there are only two architectures that
         | have led to an evolutionary process. Saying we could use "any
         | architecture" seems a bit optimistic.
         | 
         | I appreciate the thoughtful response.
        
       | gushogg-blake wrote:
       | The image of the flattened out brain could use some
       | illustrations, or more specific instructions on what we should be
       | visualising.
       | 
       | > First, if you look at a cross-section of the brain (eye-level
       | with the table)
       | 
       | I thought it was flat on the table? Surely if we look at it side-
       | on we just see the edge?
       | 
       | Without a clear idea of how to picture this, the other aspect
       | (columns) doesn't make sense either.
        
       | mandibeet wrote:
       | I think in some ways by considering the brain as a Darwin
       | Machine, we can explore new dimensions of how our minds work
        
       | FrustratedMonky wrote:
       | There is a lot of quibbling over details, but this is a 1-2 page
       | high level elevator pitch, so will have some things glossed over.
       | To that end, it seems like some interesting concepts for further
       | exploration.
        
       | specialist wrote:
       | I read the followup:
       | 
       | Lingua ex Machina: Reconciling Darwin and Chomsky with the Human
       | [2000]
       | 
       | https://www.amazon.com/Lingua-Machina-Reconciling-Darwin-Cho...
       | 
       | Completely changed my worldview. Evolutionary processes every
       | where.
       | 
       | My (turrible) recollection:
       | 
       | Darwinian processes for comprehending speech, the process of
       | translating sounds into phenomes (?).
       | 
       | There's something like a brain song, where a harmony signal
       | echoes back and forth.
       | 
       | Competition between and among hexagonal processing units (what
       | Jeff Hawkins & Numenta are studying). My paraphrasing: meme PvP
       | F4A battlefield where "winning" means converting your neighbor to
       | your faction.
       | 
       | Speculation about the human brain leaped from proto-language
       | (noun-verb) to Chomsky language (recursively composable noun-
       | verb-object predicates). Further speculation how that might be
       | encoding in our brains.
       | 
       | Etc.
        
       | cs702 wrote:
       | Big-picture, the idea is that different modalities of sensory
       | data (visual, olfactory, etc.) are processed by different
       | minicolumns in the brain, i.e., different subnetworks, each
       | outputting a different firing pattern. These firing patterns
       | propagate across the surface area of the brain, competing with
       | conflicting messages. And then, to quote the OP, "after some
       | period of time a winner is chosen, likely the message that
       | controls the greatest surface area, the greatest number of
       | minicolumns. When this happens, the winning minicolumns are
       | rewarded, likely prompting them to encode a tendency for that
       | firing pattern into their structure." And this happens in
       | multiple layers of the brain.
       | 
       | In other words, there's some kind of iterative mechanism for
       | higher-level layers to find which lower-level subnetworks are
       | most in agreement about the input data, inducing learning.
       | 
       | Capsule-routing algorithms, proposed by Hinton and others, seek
       | to implement precisely this idea, typically with some kind of
       | expectation-maximization (EM) process.
       | 
       | There are quite a few implementations available on github:
       | 
       | https://github.com/topics/capsules
       | 
       | https://github.com/topics/em-routing
       | 
       | https://github.com/topics/routing-algorithm
        
         | calepayson wrote:
         | Great summary. Thanks for the links. These are awesome.
        
         | abeppu wrote:
         | I haven't heard of anyone talk about Hinton's capsule network
         | concepts for some time. In 2017-18 it seemed exciting both
         | because of Hinton but also because the pose/transformation
         | sounded pretty reasonable. I don't know what would count as
         | "explanation", but I'd be curious to hear any thoughts about
         | why it seems they didn't really pan out. (Are there any tasks
         | for which capsule methods are the best?)
        
           | cs702 wrote:
           | The short answer as to why capsule networks have "fallen out
           | of fashion" is... _Transformers_.
           | 
           | Transformers came out at roughly the same time[a] and have
           | proven to be great at... pretty much everything. _They just
           | work._ Since then, most AI research money, effort, and
           | compute has been invested to study and improve Transformers
           | and related models, at the expense of almost everything else.
           | 
           | Many promising ideas, including routing, won't be seriously
           | re-explored until and unless progress towards AGI seems to
           | stall.
           | 
           | ---
           | 
           | [a] https://arxiv.org/abs/1706.03762
        
             | abeppu wrote:
             | I think this is a non-answer in some sense. Yes,
             | transformers have been clearly very successful across a
             | very wide range of tasks. But what _about_ the approach
             | taken in capsules is comparatively deficient?
             | 
             | Some kinds of explanations which I think are at least
             | plausible (but IDK if any evidence exists for them):
             | 
             | - The attention structure in transformers allows any chunk
             | to be learned to be important for any other chunk. And
             | pretty quickly people tended towards these being pretty
             | deep. By comparison, the capsule + routing structure (IIUC)
             | came with a built-in kind of sparsity (from capsules at a
             | level in the hierarchy being independent), and because the
             | hierarchy was meant to align with composition, it often (I
             | think) didn't have a _huge_ number of levels? Maybe this
             | flexibility + depth are key?
             | 
             | - Related to capsules being independent, an initial design
             | feature in capsule networks seems to have been smaller
             | model sizes. Perhaps this was at some level just a bad
             | thing to reach for? I think at the time, "smaller models
             | means optimization searches over a smaller space, which is
             | faster to converge and requires less data" was still sort
             | of in people's heads, and I think this view is pretty much
             | dead.
             | 
             | - I've heard some people argue that one of the core
             | strengths of transformers is that they support training in
             | a way allows for maxing out available GPUs. I think this is
             | mostly in comparison to previous language models which were
             | explicitly sequential. But are capsule networks less
             | conducive to efficient training?
        
               | cs702 wrote:
               | It's hard to make a fair comparison, because there hasn't
               | been anywhere near as much money, effort, or compute
               | invested in trying to scale up routing methods.
               | 
               | Promising approaches are often ignored for reasons that
               | have little to do with their merits. For example, Hinton,
               | Bengio, and Lecun spent much of the 1990's and all of the
               | 2000's in the fringes of academia, unable to get much
               | funding, because few others were interested in or saw any
               | promise in deep learning! Similarly, Katalin Kariko lost
               | her job and spent almost two decades in obscurity because
               | few others were interested or saw any promise in RNA
               | vaccines!
               | 
               | Now, I'm not saying routing methods will become more
               | popular in the future. I mean, who the heck knows?
               | 
               | What I'm saying is that promising approaches can fall out
               | of favor for reasons that are not intrinsic to them.
        
           | MAXPOOL wrote:
           | If you take a birds eye view, fundamental breakthroughs don't
           | happen that often. "Attention Is All You Need" paper also
           | came out in 2017. It has now been 7 years without
           | breakthrough at the same level as transformers. Breakthrough
           | ideas can take decades before they are ready. There are many
           | false starts and dead ends.
           | 
           | Money and popularity are orthogonal to pathfinding that leads
           | to breakthroughs.
        
             | calepayson wrote:
             | Well said
        
       | superqd wrote:
       | Nitpick: lots of text descriptions of visual patterns - this
       | article could use at least 5 visual aid images.
        
         | calepayson wrote:
         | The book provides a ton. I'll write another version that
         | follows the book more closely and uses them. Thanks for the
         | feedback.
        
       | osmarks wrote:
       | I don't think this is true as stated. Evolutionary algorithms are
       | not the most efficient way to do most things because they,
       | handwavily, search randomly in all directions. Gradient descent
       | and other gradient-based optimizers are way way faster where we
       | can apply them: the brain probably can't do proper backprop for
       | architectural reasons but I am confident it uses something much
       | smarter than blind evolutionary search.
        
         | cs702 wrote:
         | The OP is _not_ about evolutionary algorithms in the usual
         | sense (random mutation and selection over many generations).
         | 
         | It's about mechanisms in the brain that plausibly evolved over
         | time.
        
           | osmarks wrote:
           | > A Darwin Machine uses evolution to produce intelligence. It
           | relies on the same insight that produced biology: That
           | evolution is the best algorithm for predicting valid
           | "solutions" within a near infinite problem space.
           | 
           | It seems to be suggesting that neuron firing patterns (or
           | something like that?) are selected by internal evolutionary
           | processes.
        
         | calepayson wrote:
         | > Evolutionary algorithms are not the most efficient way to do
         | most things because they, handwavily, search randomly in all
         | directions.
         | 
         | I think we agree and would love to dive a bit deeper with you
         | here. My background is in biology and I'm very much an
         | enthusiastic amateur when it comes to CS.
         | 
         | When I first read about Darwin Machines, I looked up
         | "evolutionary algorithms in AI", thought to myself "Oh hell ya,
         | these CS folks are on it" and then was shocked to learn that
         | "evolutionary algorithms" seemed to be based on an old school
         | conception of evolution.
         | 
         | First, evolution is on your team, it hates random search. In
         | biology point mutations are the equivalent of random search,
         | and organisms do everything in their power to minimize them.
         | 
         | As I said in the article, If we were building a skyscraper and
         | someone told us they wanted to place some bricks at random
         | angles "so that we might accidentally stumble upon a better
         | design" we would call them crazy. And rightfully so.
         | 
         | Evolution still needs variation though, and it gets it through
         | recombination. Recombination is when we take traits that we
         | know work, and shuffle them to get something new. It provides
         | much more variation with a much smaller chance of producing
         | something that decreases fitness.
         | 
         | It took me a while to grok how recombination produces anything
         | novel, if we're shuffling existing traits how do we get a new
         | trait? I still don't have a "silver-bullet" answer for this but
         | I find that I usually visualize these concepts too far up the
         | hierarchy. When I think of traits I think of eye color or hair
         | color (and I suspect you do to). A trait is really just a
         | protein (sometimes not even that) and those examples are the
         | outliers where a single protein is responsible.
         | 
         | It might be better to think of cancer suppression systems,
         | which can be made up of thousands of proteins and pathways.
         | They're like a large code base that proofreads. Imagine this
         | code base has tons of different functions for different
         | scenarios.
         | 
         | Point mutations, what evolution hates, is like going into that
         | code base and randomizing some individual characters. You're
         | probably going to break the relevant function.
         | 
         | Recombination, what evolution loves, is like going in and
         | swapping two functions that take the same input, produce the
         | same output, but are implemented differently. You can see how
         | this blind shuffling might lead to improvements.
         | 
         | How evolution creates new functions is a much more difficult
         | topic. If you're interested, I recommend "The Selfish Gene".
         | It's the best book I've ever read.
         | 
         | >Gradient descent and other gradient-based optimizers are way
         | way faster where we can apply them
         | 
         | The second point is based on my (limited) understanding of non-
         | biology things. Please point me in the right direction if you
         | see me making a mistake.
         | 
         | Gradient descent etc. are way faster when we can apply them.
         | But I don't think we can apply them to these problems.
         | 
         | My understanding of modern machine learning is that it can be
         | creative in constrained environments. I hear move 37 is a great
         | example but I don't know enough about go to feel any sort of
         | way about it. My sense is: if you limit the problem space
         | gradient decent can find creative solutions.
         | 
         | But intelligence like you or I's operates in an unconstrained
         | problem space. I don't think you can apply gradient descent
         | because, how the heck could you possibly score a behavior?
         | 
         | This is where evolution excels as an algorithm. It can take an
         | infinite problem space and consistently come up with "valid"
         | solutions to it.
         | 
         | >the brain probably can't do proper backprop for architectural
         | reasons but I am confident it uses something much smarter than
         | blind evolutionary search.
         | 
         | I think Darwin Machines might be able to explain "animal
         | intelligence". But human intelligence is a whole other deal.
         | There's some incredible research on it that is (as far as I can
         | tell) largely undiscovered by AI engineers that I can share if
         | you're interested.
        
       ___________________________________________________________________
       (page generated 2024-07-17 23:09 UTC)