[HN Gopher] On Device Learning
       ___________________________________________________________________
        
       On Device Learning
        
       Author : abhaynayar
       Score  : 77 points
       Date   : 2022-09-04 15:12 UTC (7 hours ago)
        
 (HTM) web link (geohot.github.io)
 (TXT) w3m dump (geohot.github.io)
        
       | diego wrote:
       | > It must fit in a similar space to a human. Like it or not, it's
       | what the world is designed for.
       | 
       | This doesn't follow. The machine can be distributed, only
       | peripherals need to fit in human space and they don't need to be
       | smart.
        
       | RivieraKid wrote:
       | I was just thinking that the next class of advances in AI could
       | come from the ability to learn in a similar way humans do.
       | 
       | For example, a model could read a book and then answer questions
       | about it after a period of thinking about and processing the
       | information.
       | 
       | Basically, go beyond the paradigm of single forward pass.
       | 
       | The current models do something like intuitive, instant thinking
       | humans do. But they can't do the type of thinking which takes a
       | long period of time.
        
         | overspeed wrote:
         | Taking long term considerations into account does not need to
         | happen over a long period of time. The advantage of machines is
         | their superior ability to parallel process efficiently so you'd
         | expect answering questions would happen pretty quickly or even
         | as the model parses the book.
        
       | smnscu wrote:
       | > What is the human reward function?
       | 
       | https://en.wikipedia.org/wiki/Maslow%27s_hierarchy_of_needs
       | 
       | Also, I don't know the right keywords to look for it, but it
       | sounds like OP's (fascinating) question should have been tackled
       | in scientific literature. e.g. found this in 10 seconds
       | https://arxiv.org/abs/2103.04289
        
       | kumarvvr wrote:
       | George Hotz comes a long way from the days of the first iPhone
       | jailbreak.
       | 
       | I would love to hear a podcast on his life and career so far.
       | 
       | I fondly remember the details of the first iPhone jailbreak and
       | the intricate detailed explanation he put out at the time.
        
         | gryn wrote:
         | he has a youtube channel and also streams on twitch i think.
        
           | kklisura wrote:
           | He's streaming right at this moment. [1]
           | 
           | [1] https://www.twitch.tv/georgehotz
        
         | merryje wrote:
         | He was on the Lex Fridman podcast twice. He's definitely an
         | interesting guy. I've been following comma.ai for some time now
         | it's really cool seeing how much they've accomplished.
         | 
         | August 2019 - https://www.youtube.com/watch?v=iwcYp-XT7UI
         | 
         | October 2020 - https://www.youtube.com/watch?v=_L3gNaAVjQ4
        
         | [deleted]
        
         | RyJones wrote:
         | I talked with him in the olden days[0]. Great guy. Incisive
         | questions, got to the heart of things in very few words.
         | 
         | [0]: https://en.wikipedia.org/wiki/AllJoyn
        
         | keepquestioning wrote:
         | The first iPhone jailbreak is simply exposing a debug port
         | 
         | The current iPhone jailbreak requires the effort of nation
         | states.
         | 
         | I know who's winning.
        
       | sinenomine wrote:
       | I like the direction of this post - intrinsic reward via some
       | form of artificial curiosity is an understudied topic, especially
       | at scale, I can't help but see the obvious compute and experience
       | rate limitations inherent to real robots and their on-device
       | computers.
       | 
       | When training a large NN requires 100x compute to run one
       | instance of it, and when we know how much scale helps these
       | things, what follows is we really have to put it into the
       | foundation of your system design.
        
       | Geee wrote:
       | The high-level human reward function might be quite simple: just
       | try to copy other humans. This works, because all living people
       | are repositories of successful survival strategies, and
       | unsuccessful behavior is weeded out by natural selection.
       | 
       | The reward function doesn't need to understand that sticking its
       | arm in a fire pit is dangerous, it just doesn't do it because it
       | has never seen anyone do it. It can also learn this by asking
       | someone or reading it from somewhere, but it's the same thing.
       | 
       | It gets a job, because everyone else does too. It will get
       | married, buy a house, get a dog etc. It'll talk about weather
       | with its neighbor. No complex reward function needed; just copy.
        
       | Macuyiko wrote:
       | Interesting thoughts. I was having a very similar chat with a
       | friend about this recently, including exactly the question of
       | "what should the reward function be", or at least the most
       | minimal one.
       | 
       | Some aspects we thought of:
       | 
       | - pain (minimal reward): this should probably be hardwired
       | straight to the artificial brain, though it can't be enough as
       | otherwise no activity would take place. The agent would learn
       | that the best course of action is to sit still
       | 
       | - so we also came up with curiosity being a necessity.
       | Encountering an unseen or hard to predict state leads to positive
       | reward
       | 
       | - although I am not sure whether pain is not sufficient by
       | itself. E.g. in nature, actions are still necessary. Otherwise,
       | other pain signals (hunger and thirst) start showing up.
       | 
       | - what is tricky to figure out is how this works for more
       | complicated intelligence such as humans. Let's be honest, most
       | babies are fed whenever necessary by their caretakers. What
       | causes them to learn? What causes grown ups to learn?
       | 
       | So something important that we'll have to figure out is what
       | needs to be hardwired versus what emerges from 2nd order things
       | such as chemistry, hormons, gut bacteria, upbringing, role of
       | parents etc., and whether there's a difference between non-
       | conscious or simple intelligence versus complex ones w.r.t. the
       | necessity of these aspects. E.g. you might talk about aspects
       | such as "love" (towards your partner, children, born or not,
       | future generations) but it is much more unclear how necessary
       | this is, and how to quantify it.
       | 
       | Perhaps indeed this emerges from the basic reward function but
       | only after a meta-simulation:
       | 
       | > I don't think there's a way to learn it aside from millennia of
       | multi agent survival competition.
        
         | PartiallyTyped wrote:
         | Curiosity is not a necessity. While curiosity is integral part
         | of what makes a human well, human, it doesn't have to be
         | hardwired. In us it is hardwired onto dopamine circuits (cf.
         | Molecule of more, great read). However, I'd argue that in us,
         | it is simply a form of inductive bias, i.e. an existing part of
         | our hardware that makes it cheap.
         | 
         | The keyword here is cheap. Any sufficiently powerful maximizer
         | with an infinite horizon __has__ to develop curiosity otherwise
         | they will not be able to maximise their reward function.
         | 
         | In-fact, I'd argue that this is true for most mammalian
         | functions such as taking care of our pack, exhibiting pro-
         | social behaviour and so on, but there is a caveat. For this to
         | happen, there needs to be an actual benefit in the behaviour.
         | 
         | Due to evolution operating mostly linearly with small changes
         | through genetic and epigenetic information passing, there seems
         | to be relatively little variation between generations, which
         | then implies that it is difficult for some candidates to
         | overwhelm everyone else in a winner takes all fashion, hence
         | maximization of replication will eventually result in
         | cooperation simply because that allows genes in support of it
         | to continue replicating, effectively self-selecting for itself.
         | 
         | We saw this in the OpenAI video where agents eventually learned
         | to cooperate in what was effectively a prisoners dilemma. In
         | the video, there were two teams, the hiders and the seekers, in
         | an environment that could be manipulated. Eventually the two
         | teams learned strategies. From the perspective of seekers,
         | their utility function involved observing the hiders. For the
         | hiders, their utility function was to minimize their exposure
         | to the seekers. For all intents and purposes, and for each
         | team, the other agents were part of the environment. So given
         | two hiders, one could hide behind the other to minimize their
         | exposure to the seekers. This is effectively a defect.
         | Eventually however, the hiders learn to cooperate and instead
         | cooperatively manipulate the environment through strategies.
         | 
         | ---
         | 
         | Sorry if I got a little bit off topic there.
         | 
         | Regardless, what causes us to learn is neurotransmitters
         | getting released because certain circuits activate, the
         | neurotransmitters charge a neuron which causes it to fire.
         | Connections between neurons get reinforced if they are
         | frequently used, which reinforces them, makes them cheaper, and
         | that inevitably reinforces certain patterns of behaviour.
         | 
         | ---
         | 
         | What i propose is that we should instead look into analogies as
         | a means of learning. Humans seem to be great at using
         | analogies. Mathematically speaking, an analogy is a functor
         | between categories. A category is a collection of objects and
         | morphisms (directed relationships) between the objects, this is
         | as abstract and simple as it gets. A functor between categories
         | essentially maps the objects and the morphisms of one category
         | to the respective of the other. When we use an analogy, we do
         | the same.
         | 
         | > A is to B as X is to Y
         | 
         | This then allows us to learn the morphism in the category with
         | {A,B} using just known relationships.
         | 
         | I think I got off topic again, but this is something that I
         | have been recycling in my head for a while and needed to
         | eventually get out.
        
           | Macuyiko wrote:
           | > Sorry if I got a little bit off topic there.
           | 
           | Not at all. Though forgive me for not agreeing with some key
           | points you raise.
           | 
           | > Curiosity is not a necessity. While curiosity is integral
           | part of what makes a human well, human, it doesn't have to be
           | hardwired. > The keyword here is cheap. Any sufficiently
           | powerful maximizer with an infinite horizon __has__ to
           | develop curiosity otherwise they will not be able to maximise
           | their reward function.
           | 
           | With that I do in fact agree. I think curiosity was a quick
           | solution to fix some immediate problems I was seeing from the
           | pain-slash-survival angle, perhaps from a belief there should
           | be more to humanity. I also feel it is rather emergent and
           | should (must!) emerge pretty fast, even, in order to survive.
           | 
           | Actually typing out the last sentence made me realize another
           | meta-meta-level of intelligence. Whereas the basic reward
           | function is level 0, the chemistry surrounding and
           | interactions with out body it might be level 1 (might be,
           | because they are probably emergent as well), evolution is
           | definitely level 2 (or 1) - the multi-simulations of agents.
           | On top of that, there's the fact that _initialisation_ is
           | cheap: meaning that even if some emergent properties are
           | highly necessary on top of the basic reward function, and
           | might lead to very complex aspects later on, a designer (and
           | this is a very badly chosen word mayhaps) would be prepared
           | to deal with those given the fact that there are many
           | chances. Many one-cell organism striving to do better in the
           | "soup". The more I think about this, the more I start
           | becoming convinced that computational biology should have
           | been a serious field (and many are saying this).
           | 
           | > In-fact, I'd argue that this is true for most mammalian
           | functions such as taking care of our pack, exhibiting pro-
           | social behaviour and so on, but there is a caveat. For this
           | to happen, there needs to be an actual benefit in the
           | behaviour.
           | 
           | See, this is where I respectfully disagree. The benefit can
           | be emerging from a longer-term simulation rather than
           | immediately. You might say: sure but what are the chances of
           | this happening? Well how many intelligent species like ours
           | have we encountered so far? On this planet, in this universe?
           | 
           | > Due to evolution operating mostly linearly with small
           | changes through genetic and epigenetic information passing,
           | there seems to be relatively little variation between
           | generations, which then implies that it is difficult for some
           | candidates to overwhelm everyone else in a winner takes all
           | fashion, hence maximization of replication will eventually
           | result in cooperation simply because that allows genes in
           | support of it to continue replicating, effectively self-
           | selecting for itself.
           | 
           | Yes and no. From a gut feeling I agree though I also think
           | small changes tend to take over very rapidly in a population
           | pool once they show up. The waiting is mainly for the showing
           | up part.
           | 
           | > We saw this in the OpenAI video where agents eventually
           | learned to cooperate in what was effectively a prisoners
           | dilemma. In the video, there were two teams, the hiders and
           | the seekers, in an environment that could be manipulated.
           | 
           | I saw that as well, and this is why although I believe at
           | some point this will be possible, the main reason why I agree
           | with hotz is because our simulations suck and always allow
           | for exploitation. Unless it doesn't, of course. The on-device
           | part, hence, for me, is not that necessary. But it means we
           | should have a very robust simulation (which so far we don't
           | have in any area of RL and associated topics; digital twins
           | are a joke; people making them care more about dataviz; and
           | so on).
           | 
           | > Regardless, what causes us to learn is neurotransmitters
           | getting released because certain circuits activate, the
           | neurotransmitters charge a neuron which causes it to fire.
           | Connections between neurons get reinforced if they are
           | frequently used, which reinforces them, makes them cheaper,
           | and that inevitably reinforces certain patterns of behaviour.
           | 
           | Too tired to go into this but you touch upon some key
           | differences between gradient descent and biological neuron
           | learning, though we are getting closer to that: spiking
           | neurons, memory cells, even real-neuron cell chips. I am not
           | sure electronics wouldn't be able to emulate it correctly in
           | the end. If after all, P <> NP, then what does the "real"
           | difference of a computational time step make?
           | 
           | > What i propose is that we should instead look into
           | analogies as a means of learning. Humans seem to be great at
           | using analogies. Mathematically speaking, an analogy is a
           | functor between categories.
           | 
           | Agree, and surprising that this is still such an open
           | question in AI even given the one-shot and zero-shot learning
           | research. Though it seems like this has been put on the
           | backburner yet again. It amazes me even today how young
           | humans are so good at that. Like someone said: show a cartoon
           | tomato and a real tomato to a toddler. Next time show a
           | cartoon of an elephant and wait until they see a real
           | elephant. They will shout: elephant. Though on the other
           | hand, the solution for this might be very close to use. A
           | small architectural or multi-modal change. I was more
           | pessimistic about this a few years ago, but less sure today.
           | 
           | I think the main piece missing of the puzzle is stepping away
           | from supervised learning, self-supervised learning, and going
           | for a continuous self-supervised reinforcement learning,
           | where predictions for t+1 are continuously matched with
           | reality, like a human brain does. The only problem is that
           | you need to have a continuous reality. But we have that.
        
           | mistermann wrote:
           | Could you possibly post a link to the video you refer to
           | above?
        
         | simon_000666 wrote:
         | I've been thinking deeply about this over the last couple of
         | weeks. So sure, the obvious answer is the overriding human
         | reward function is survival, propagation of our dna. Except I
         | think it's more complex than that, take War for example where
         | people sacrifice themselves for an idea of nationality or
         | culture, that goes against the 'selfish gene' theory. I think
         | the answer is there are multiple reward functions that compete
         | for dominance. Perhaps each of the different 'brains' has it's
         | own reward function.
        
           | aaaaaaaaaaab wrote:
           | >take War for example where people sacrifice themselves for
           | an idea of nationality or culture, that goes against the
           | 'selfish gene' theory
           | 
           | People sacrifice themselves in wars, because they think it's
           | going to improve the chances of survival for their offspring,
           | their extended family, their tribe, their kin, their nation,
           | etc.
        
             | Macuyiko wrote:
             | Exactly. This is the crux of the question: what is
             | necessary as a starting point. What is not. And given what
             | is: what emerges over time (like nationalism).
        
           | jlpom wrote:
           | > that goes against the 'selfish gene' theory
           | 
           | On the contrary, it confirms it, it means some individuals
           | for the overall replication success of the genes they are
           | constituted of, will sacrifice themselves.
           | 
           | In the book from which this theory have been popularised, the
           | exemple of bees stinging like kamikazes is given.
        
       | mrkramer wrote:
       | Life came before intelligence and intelligence emerged as a tool
       | for advancing life or in another words intelligence helps you to
       | increase your survival rate and it helps you to evolve. Those two
       | things are interconnected.
       | 
       | I was researching Artificial Life a little bit and I came across
       | this paper: Philosophical Aspects of Artificial Life, Mark A.
       | Bedau.
       | 
       | The paper states[0] that processes characteristic of living
       | systems are:
       | 
       | self-organization, spontaneous generation of order, and
       | cooperation
       | 
       | self-reproduction and metabolization
       | 
       | learning, adaptation, purposiveness, and evolution
       | 
       | So artificial general intelligence would need to exhibit some of
       | these characteristics if not all in order to be called "general".
       | What I'm trying to say is that AGI would need to self-organize,
       | cooperate, reproduce, learn, adapt and evolve either in digital
       | or physical environment in order to be called Artificial General
       | Intelligence and then it would sort of became Artificial Life
       | capable of surviving on its own without anyone telling it what to
       | do next.
       | 
       | [0]
       | https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138...
        
       | naillo wrote:
       | Makes me think of Carmacks remark (from the lex fridman
       | interview) that he 'almost count anyone out' that tries to make
       | AGI happen through real life humanoid-ish machines. (Meaning
       | simulated worlds that can do e.g. 1000x real time is much more
       | efficinent.)
        
         | raghavtoshniwal wrote:
         | Geohot has some bias to believe AGI would have to exist in
         | meatspace, he has invested years into cracking self driving
         | cars and dealing with the messy real world.
         | 
         | He does plug open pilot at the end.
        
           | krick wrote:
           | I'm not gonna take anyone's side here, just to point out that
           | "bias" goes both ways. Carmack is a life-long game-developer
           | (always more focused on "technical" aspects of gamedev, like
           | graphics), and now basically a rich celebrity, who can
           | indulge himself by something as grand and abstract as
           | "working on AGI" (since 2019).
           | 
           | While Hotz is probably best known for his exploits, he still
           | is a legit machine learning professional, and comma.ai is
           | legit business actually focused on "AI-kinda -stuff".
           | 
           | Now, AGI is that thing from the joke about "teenage sex".
           | Everyone has some opinion, but we don't really take it too
           | seriously when LeCun, or Karpathy, or Hinton express their
           | opinions on that, because everyone (except for Musk fans,
           | obviously) knows, that _no one_ knows. And as I 've said, I'm
           | not gonna take anyone's side here, but unlike Hotz, Carmack
           | has almost as much clue about that stuff as some random FE-
           | developer out there. So you can as well replace "bias" with
           | "experience" here, and it would be as true, but the whole
           | meaning would be quite different.
        
             | cma wrote:
             | > Carmack is a life-long game-developer (always more
             | focused on "technical" aspects of gamedev, like graphics),
             | and now basically a rich celebrity, who can indulge himself
             | by something as grand and abstract as "working on AGI"
             | (since 2019).
             | 
             | He also started a rocket company and was a big part of
             | making VR workable with reprojection techniques and other
             | innovations (some of them may have had precedent but so did
             | Doom, he and others got it all working in useable form this
             | time, like Doom (Ultima Underworld was a bit slow and had
             | to be run in a much smaller fraction of the screen)).
             | 
             | Beyond just pure rendering, he also created the first
             | usable megatexturing/virtual texturing, now used about
             | everywhere, which is almost more of a data management
             | problem. It maybe wasn't that great until SSDs, but he knew
             | that and put out an early mindblowing iphone version using
             | the solid state storage and integrated cpu/gpu memory as
             | well.
        
         | mattmee wrote:
         | george talks about "A banana peel? Did you put all those things
         | in your simulator?" -- it doesn't matter if it's in the
         | simulation yet, the generalization of your algorithm is what
         | matters... once it's ready to act with new objects by itself,
         | put it in there to see if it can generalize around it. Humans
         | dont know what a banana peel is until they've been exposed to
         | it, and they work with it.
         | 
         | smart of carmack to say "almost anyone", clearly smart people
         | are working via the humanoid aspect and I'd guess they'll have
         | some discoveries before the simulation people, but not
         | necessarily because they're going from the humanoid approach
        
           | liuliu wrote:
           | At certain point, training / inference difference will be
           | blur. Is GPT-3's prompt some kind of "learning" or just a
           | different way to poke the model?
           | 
           | For RL, things like domain randomization, particularly, the
           | way to train "teacher-student" network (like in MIT Cheetah
           | running paper), when inference, it recovers physic parameters
           | of the body, does that count as "learning"?
           | 
           | The simulation may not have "banana peel", but for a multi-
           | modality model, it is not hard to imagine it has encountered
           | an object with similar physic parameters before and "other
           | models" in the system, after "fall over", can recover such
           | parameters and won't be tricked again. Does that count as
           | "learning"?
        
             | mattmee wrote:
             | It all depends on how you define "learning".
             | 
             | I was just saying that I think George is wrong to say "The
             | only way forward is on device learning." Especially his use
             | of the word "only". I think in order to interact in the
             | real world, you will have to learn in the real world to
             | some extent, but like pilots do in training,
             | something/someone can learn a lot from simulations.
             | 
             | Something to consider is that living creatures in general
             | have hundreds of millions (billions?) of years and (insert
             | very, very large number) of variations of trail and error
             | in the real world. Intelligence and learning came way
             | before humans.
        
               | liuliu wrote:
               | Oh, I am totally in the camp of "simulation is enough".
               | My reply is a convoluted way to argue that once learned
               | enough in simulation, adaptation in real-world should
               | just work (if you treat adaptation as a way of learning).
        
       | cloogshicer wrote:
       | > What is the human reward function?
       | 
       | Sentences like this scare me.
       | 
       | The thought that something as nuanced, complex, diverse, and
       | ever-changing as human desires, can be captured in a single,
       | static "reward function" sounds ridiculous to me.
       | 
       | And yet, people will try, and they will probably get pretty
       | close. And then those whose desires do not fit neatly into that
       | "reward function", will suffer.
       | 
       | I can already see lots of suffering caused by this type of
       | thinking, today.
        
         | jdeaton wrote:
         | Have you heard of the molecule dopamine?
        
           | harperlee wrote:
           | That's the reward mechanism, not the function, isn't it? You
           | can release it by jumping off a cliff or reading a novel.
        
       | ot wrote:
       | This is very similar to the idea of embodied cognition:
       | 
       | > Embodied cognition is the theory that many features of
       | cognition, whether human or otherwise, are shaped by aspects of
       | an organism's entire body. Sensory and motor systems are seen as
       | fundamentally integrated with cognitive processing. The cognitive
       | features include high-level mental constructs (such as concepts
       | and categories) and performance on various cognitive tasks (such
       | as reasoning or judgment). The bodily aspects involve the motor
       | system, the perceptual system, the bodily interactions with the
       | environment (situatedness), and the assumptions about the world
       | built into the organism's functional structure.
       | 
       | https://en.wikipedia.org/wiki/Embodied_cognition
       | 
       | I believe this is a pretty mainstream concept in AI, also see the
       | "AI and robotics" section in the page.
        
       | carom wrote:
       | I think his definition of AGI is off. It does not need human
       | characteristics as it will be machine intelligence, not human
       | intelligence. I think seeking AGI from a human-centric view will
       | keep us searching in the wrong locations for things that feel
       | human but are not generally intelligent.
       | 
       | For General Machine Intelligence I would like to see research in
       | the following areas -
       | 
       | 1) Understanding of the hardware and software on which it runs.
       | This will allow it to introspect and self improve at the lowest
       | level. I think there is a lot of opportunities in research in
       | applying our large language models to machine languages.
       | 
       | 2) An internal representation of facts. While there is no
       | guarantee that it would outwardly represent the truth, an
       | intelligence must be able to discern between reality and
       | hallucinations.
       | 
       | 3) A probabilistic reasoning engine. Based on its priors about X
       | and Y, find the probability of X -> Y. This could also aid in
       | forgetting, as it could then generalize and discard individual
       | facts.
       | 
       | I really believe we should pivot to an idea of Machine
       | Intelligence. Otherwise we will continue chasing metrics that
       | make probabilistic models feel human, but not necessarily bring
       | them to life.
        
       ___________________________________________________________________
       (page generated 2022-09-04 23:01 UTC)