[HN Gopher] On Device Learning
___________________________________________________________________
On Device Learning
Author : abhaynayar
Score : 77 points
Date : 2022-09-04 15:12 UTC (7 hours ago)
(HTM) web link (geohot.github.io)
(TXT) w3m dump (geohot.github.io)
| diego wrote:
| > It must fit in a similar space to a human. Like it or not, it's
| what the world is designed for.
|
| This doesn't follow. The machine can be distributed, only
| peripherals need to fit in human space and they don't need to be
| smart.
| RivieraKid wrote:
| I was just thinking that the next class of advances in AI could
| come from the ability to learn in a similar way humans do.
|
| For example, a model could read a book and then answer questions
| about it after a period of thinking about and processing the
| information.
|
| Basically, go beyond the paradigm of single forward pass.
|
| The current models do something like intuitive, instant thinking
| humans do. But they can't do the type of thinking which takes a
| long period of time.
| overspeed wrote:
| Taking long term considerations into account does not need to
| happen over a long period of time. The advantage of machines is
| their superior ability to parallel process efficiently so you'd
| expect answering questions would happen pretty quickly or even
| as the model parses the book.
| smnscu wrote:
| > What is the human reward function?
|
| https://en.wikipedia.org/wiki/Maslow%27s_hierarchy_of_needs
|
| Also, I don't know the right keywords to look for it, but it
| sounds like OP's (fascinating) question should have been tackled
| in scientific literature. e.g. found this in 10 seconds
| https://arxiv.org/abs/2103.04289
| kumarvvr wrote:
| George Hotz comes a long way from the days of the first iPhone
| jailbreak.
|
| I would love to hear a podcast on his life and career so far.
|
| I fondly remember the details of the first iPhone jailbreak and
| the intricate detailed explanation he put out at the time.
| gryn wrote:
| he has a youtube channel and also streams on twitch i think.
| kklisura wrote:
| He's streaming right at this moment. [1]
|
| [1] https://www.twitch.tv/georgehotz
| merryje wrote:
| He was on the Lex Fridman podcast twice. He's definitely an
| interesting guy. I've been following comma.ai for some time now
| it's really cool seeing how much they've accomplished.
|
| August 2019 - https://www.youtube.com/watch?v=iwcYp-XT7UI
|
| October 2020 - https://www.youtube.com/watch?v=_L3gNaAVjQ4
| [deleted]
| RyJones wrote:
| I talked with him in the olden days[0]. Great guy. Incisive
| questions, got to the heart of things in very few words.
|
| [0]: https://en.wikipedia.org/wiki/AllJoyn
| keepquestioning wrote:
| The first iPhone jailbreak is simply exposing a debug port
|
| The current iPhone jailbreak requires the effort of nation
| states.
|
| I know who's winning.
| sinenomine wrote:
| I like the direction of this post - intrinsic reward via some
| form of artificial curiosity is an understudied topic, especially
| at scale, I can't help but see the obvious compute and experience
| rate limitations inherent to real robots and their on-device
| computers.
|
| When training a large NN requires 100x compute to run one
| instance of it, and when we know how much scale helps these
| things, what follows is we really have to put it into the
| foundation of your system design.
| Geee wrote:
| The high-level human reward function might be quite simple: just
| try to copy other humans. This works, because all living people
| are repositories of successful survival strategies, and
| unsuccessful behavior is weeded out by natural selection.
|
| The reward function doesn't need to understand that sticking its
| arm in a fire pit is dangerous, it just doesn't do it because it
| has never seen anyone do it. It can also learn this by asking
| someone or reading it from somewhere, but it's the same thing.
|
| It gets a job, because everyone else does too. It will get
| married, buy a house, get a dog etc. It'll talk about weather
| with its neighbor. No complex reward function needed; just copy.
| Macuyiko wrote:
| Interesting thoughts. I was having a very similar chat with a
| friend about this recently, including exactly the question of
| "what should the reward function be", or at least the most
| minimal one.
|
| Some aspects we thought of:
|
| - pain (minimal reward): this should probably be hardwired
| straight to the artificial brain, though it can't be enough as
| otherwise no activity would take place. The agent would learn
| that the best course of action is to sit still
|
| - so we also came up with curiosity being a necessity.
| Encountering an unseen or hard to predict state leads to positive
| reward
|
| - although I am not sure whether pain is not sufficient by
| itself. E.g. in nature, actions are still necessary. Otherwise,
| other pain signals (hunger and thirst) start showing up.
|
| - what is tricky to figure out is how this works for more
| complicated intelligence such as humans. Let's be honest, most
| babies are fed whenever necessary by their caretakers. What
| causes them to learn? What causes grown ups to learn?
|
| So something important that we'll have to figure out is what
| needs to be hardwired versus what emerges from 2nd order things
| such as chemistry, hormons, gut bacteria, upbringing, role of
| parents etc., and whether there's a difference between non-
| conscious or simple intelligence versus complex ones w.r.t. the
| necessity of these aspects. E.g. you might talk about aspects
| such as "love" (towards your partner, children, born or not,
| future generations) but it is much more unclear how necessary
| this is, and how to quantify it.
|
| Perhaps indeed this emerges from the basic reward function but
| only after a meta-simulation:
|
| > I don't think there's a way to learn it aside from millennia of
| multi agent survival competition.
| PartiallyTyped wrote:
| Curiosity is not a necessity. While curiosity is integral part
| of what makes a human well, human, it doesn't have to be
| hardwired. In us it is hardwired onto dopamine circuits (cf.
| Molecule of more, great read). However, I'd argue that in us,
| it is simply a form of inductive bias, i.e. an existing part of
| our hardware that makes it cheap.
|
| The keyword here is cheap. Any sufficiently powerful maximizer
| with an infinite horizon __has__ to develop curiosity otherwise
| they will not be able to maximise their reward function.
|
| In-fact, I'd argue that this is true for most mammalian
| functions such as taking care of our pack, exhibiting pro-
| social behaviour and so on, but there is a caveat. For this to
| happen, there needs to be an actual benefit in the behaviour.
|
| Due to evolution operating mostly linearly with small changes
| through genetic and epigenetic information passing, there seems
| to be relatively little variation between generations, which
| then implies that it is difficult for some candidates to
| overwhelm everyone else in a winner takes all fashion, hence
| maximization of replication will eventually result in
| cooperation simply because that allows genes in support of it
| to continue replicating, effectively self-selecting for itself.
|
| We saw this in the OpenAI video where agents eventually learned
| to cooperate in what was effectively a prisoners dilemma. In
| the video, there were two teams, the hiders and the seekers, in
| an environment that could be manipulated. Eventually the two
| teams learned strategies. From the perspective of seekers,
| their utility function involved observing the hiders. For the
| hiders, their utility function was to minimize their exposure
| to the seekers. For all intents and purposes, and for each
| team, the other agents were part of the environment. So given
| two hiders, one could hide behind the other to minimize their
| exposure to the seekers. This is effectively a defect.
| Eventually however, the hiders learn to cooperate and instead
| cooperatively manipulate the environment through strategies.
|
| ---
|
| Sorry if I got a little bit off topic there.
|
| Regardless, what causes us to learn is neurotransmitters
| getting released because certain circuits activate, the
| neurotransmitters charge a neuron which causes it to fire.
| Connections between neurons get reinforced if they are
| frequently used, which reinforces them, makes them cheaper, and
| that inevitably reinforces certain patterns of behaviour.
|
| ---
|
| What i propose is that we should instead look into analogies as
| a means of learning. Humans seem to be great at using
| analogies. Mathematically speaking, an analogy is a functor
| between categories. A category is a collection of objects and
| morphisms (directed relationships) between the objects, this is
| as abstract and simple as it gets. A functor between categories
| essentially maps the objects and the morphisms of one category
| to the respective of the other. When we use an analogy, we do
| the same.
|
| > A is to B as X is to Y
|
| This then allows us to learn the morphism in the category with
| {A,B} using just known relationships.
|
| I think I got off topic again, but this is something that I
| have been recycling in my head for a while and needed to
| eventually get out.
| Macuyiko wrote:
| > Sorry if I got a little bit off topic there.
|
| Not at all. Though forgive me for not agreeing with some key
| points you raise.
|
| > Curiosity is not a necessity. While curiosity is integral
| part of what makes a human well, human, it doesn't have to be
| hardwired. > The keyword here is cheap. Any sufficiently
| powerful maximizer with an infinite horizon __has__ to
| develop curiosity otherwise they will not be able to maximise
| their reward function.
|
| With that I do in fact agree. I think curiosity was a quick
| solution to fix some immediate problems I was seeing from the
| pain-slash-survival angle, perhaps from a belief there should
| be more to humanity. I also feel it is rather emergent and
| should (must!) emerge pretty fast, even, in order to survive.
|
| Actually typing out the last sentence made me realize another
| meta-meta-level of intelligence. Whereas the basic reward
| function is level 0, the chemistry surrounding and
| interactions with out body it might be level 1 (might be,
| because they are probably emergent as well), evolution is
| definitely level 2 (or 1) - the multi-simulations of agents.
| On top of that, there's the fact that _initialisation_ is
| cheap: meaning that even if some emergent properties are
| highly necessary on top of the basic reward function, and
| might lead to very complex aspects later on, a designer (and
| this is a very badly chosen word mayhaps) would be prepared
| to deal with those given the fact that there are many
| chances. Many one-cell organism striving to do better in the
| "soup". The more I think about this, the more I start
| becoming convinced that computational biology should have
| been a serious field (and many are saying this).
|
| > In-fact, I'd argue that this is true for most mammalian
| functions such as taking care of our pack, exhibiting pro-
| social behaviour and so on, but there is a caveat. For this
| to happen, there needs to be an actual benefit in the
| behaviour.
|
| See, this is where I respectfully disagree. The benefit can
| be emerging from a longer-term simulation rather than
| immediately. You might say: sure but what are the chances of
| this happening? Well how many intelligent species like ours
| have we encountered so far? On this planet, in this universe?
|
| > Due to evolution operating mostly linearly with small
| changes through genetic and epigenetic information passing,
| there seems to be relatively little variation between
| generations, which then implies that it is difficult for some
| candidates to overwhelm everyone else in a winner takes all
| fashion, hence maximization of replication will eventually
| result in cooperation simply because that allows genes in
| support of it to continue replicating, effectively self-
| selecting for itself.
|
| Yes and no. From a gut feeling I agree though I also think
| small changes tend to take over very rapidly in a population
| pool once they show up. The waiting is mainly for the showing
| up part.
|
| > We saw this in the OpenAI video where agents eventually
| learned to cooperate in what was effectively a prisoners
| dilemma. In the video, there were two teams, the hiders and
| the seekers, in an environment that could be manipulated.
|
| I saw that as well, and this is why although I believe at
| some point this will be possible, the main reason why I agree
| with hotz is because our simulations suck and always allow
| for exploitation. Unless it doesn't, of course. The on-device
| part, hence, for me, is not that necessary. But it means we
| should have a very robust simulation (which so far we don't
| have in any area of RL and associated topics; digital twins
| are a joke; people making them care more about dataviz; and
| so on).
|
| > Regardless, what causes us to learn is neurotransmitters
| getting released because certain circuits activate, the
| neurotransmitters charge a neuron which causes it to fire.
| Connections between neurons get reinforced if they are
| frequently used, which reinforces them, makes them cheaper,
| and that inevitably reinforces certain patterns of behaviour.
|
| Too tired to go into this but you touch upon some key
| differences between gradient descent and biological neuron
| learning, though we are getting closer to that: spiking
| neurons, memory cells, even real-neuron cell chips. I am not
| sure electronics wouldn't be able to emulate it correctly in
| the end. If after all, P <> NP, then what does the "real"
| difference of a computational time step make?
|
| > What i propose is that we should instead look into
| analogies as a means of learning. Humans seem to be great at
| using analogies. Mathematically speaking, an analogy is a
| functor between categories.
|
| Agree, and surprising that this is still such an open
| question in AI even given the one-shot and zero-shot learning
| research. Though it seems like this has been put on the
| backburner yet again. It amazes me even today how young
| humans are so good at that. Like someone said: show a cartoon
| tomato and a real tomato to a toddler. Next time show a
| cartoon of an elephant and wait until they see a real
| elephant. They will shout: elephant. Though on the other
| hand, the solution for this might be very close to use. A
| small architectural or multi-modal change. I was more
| pessimistic about this a few years ago, but less sure today.
|
| I think the main piece missing of the puzzle is stepping away
| from supervised learning, self-supervised learning, and going
| for a continuous self-supervised reinforcement learning,
| where predictions for t+1 are continuously matched with
| reality, like a human brain does. The only problem is that
| you need to have a continuous reality. But we have that.
| mistermann wrote:
| Could you possibly post a link to the video you refer to
| above?
| simon_000666 wrote:
| I've been thinking deeply about this over the last couple of
| weeks. So sure, the obvious answer is the overriding human
| reward function is survival, propagation of our dna. Except I
| think it's more complex than that, take War for example where
| people sacrifice themselves for an idea of nationality or
| culture, that goes against the 'selfish gene' theory. I think
| the answer is there are multiple reward functions that compete
| for dominance. Perhaps each of the different 'brains' has it's
| own reward function.
| aaaaaaaaaaab wrote:
| >take War for example where people sacrifice themselves for
| an idea of nationality or culture, that goes against the
| 'selfish gene' theory
|
| People sacrifice themselves in wars, because they think it's
| going to improve the chances of survival for their offspring,
| their extended family, their tribe, their kin, their nation,
| etc.
| Macuyiko wrote:
| Exactly. This is the crux of the question: what is
| necessary as a starting point. What is not. And given what
| is: what emerges over time (like nationalism).
| jlpom wrote:
| > that goes against the 'selfish gene' theory
|
| On the contrary, it confirms it, it means some individuals
| for the overall replication success of the genes they are
| constituted of, will sacrifice themselves.
|
| In the book from which this theory have been popularised, the
| exemple of bees stinging like kamikazes is given.
| mrkramer wrote:
| Life came before intelligence and intelligence emerged as a tool
| for advancing life or in another words intelligence helps you to
| increase your survival rate and it helps you to evolve. Those two
| things are interconnected.
|
| I was researching Artificial Life a little bit and I came across
| this paper: Philosophical Aspects of Artificial Life, Mark A.
| Bedau.
|
| The paper states[0] that processes characteristic of living
| systems are:
|
| self-organization, spontaneous generation of order, and
| cooperation
|
| self-reproduction and metabolization
|
| learning, adaptation, purposiveness, and evolution
|
| So artificial general intelligence would need to exhibit some of
| these characteristics if not all in order to be called "general".
| What I'm trying to say is that AGI would need to self-organize,
| cooperate, reproduce, learn, adapt and evolve either in digital
| or physical environment in order to be called Artificial General
| Intelligence and then it would sort of became Artificial Life
| capable of surviving on its own without anyone telling it what to
| do next.
|
| [0]
| https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138...
| naillo wrote:
| Makes me think of Carmacks remark (from the lex fridman
| interview) that he 'almost count anyone out' that tries to make
| AGI happen through real life humanoid-ish machines. (Meaning
| simulated worlds that can do e.g. 1000x real time is much more
| efficinent.)
| raghavtoshniwal wrote:
| Geohot has some bias to believe AGI would have to exist in
| meatspace, he has invested years into cracking self driving
| cars and dealing with the messy real world.
|
| He does plug open pilot at the end.
| krick wrote:
| I'm not gonna take anyone's side here, just to point out that
| "bias" goes both ways. Carmack is a life-long game-developer
| (always more focused on "technical" aspects of gamedev, like
| graphics), and now basically a rich celebrity, who can
| indulge himself by something as grand and abstract as
| "working on AGI" (since 2019).
|
| While Hotz is probably best known for his exploits, he still
| is a legit machine learning professional, and comma.ai is
| legit business actually focused on "AI-kinda -stuff".
|
| Now, AGI is that thing from the joke about "teenage sex".
| Everyone has some opinion, but we don't really take it too
| seriously when LeCun, or Karpathy, or Hinton express their
| opinions on that, because everyone (except for Musk fans,
| obviously) knows, that _no one_ knows. And as I 've said, I'm
| not gonna take anyone's side here, but unlike Hotz, Carmack
| has almost as much clue about that stuff as some random FE-
| developer out there. So you can as well replace "bias" with
| "experience" here, and it would be as true, but the whole
| meaning would be quite different.
| cma wrote:
| > Carmack is a life-long game-developer (always more
| focused on "technical" aspects of gamedev, like graphics),
| and now basically a rich celebrity, who can indulge himself
| by something as grand and abstract as "working on AGI"
| (since 2019).
|
| He also started a rocket company and was a big part of
| making VR workable with reprojection techniques and other
| innovations (some of them may have had precedent but so did
| Doom, he and others got it all working in useable form this
| time, like Doom (Ultima Underworld was a bit slow and had
| to be run in a much smaller fraction of the screen)).
|
| Beyond just pure rendering, he also created the first
| usable megatexturing/virtual texturing, now used about
| everywhere, which is almost more of a data management
| problem. It maybe wasn't that great until SSDs, but he knew
| that and put out an early mindblowing iphone version using
| the solid state storage and integrated cpu/gpu memory as
| well.
| mattmee wrote:
| george talks about "A banana peel? Did you put all those things
| in your simulator?" -- it doesn't matter if it's in the
| simulation yet, the generalization of your algorithm is what
| matters... once it's ready to act with new objects by itself,
| put it in there to see if it can generalize around it. Humans
| dont know what a banana peel is until they've been exposed to
| it, and they work with it.
|
| smart of carmack to say "almost anyone", clearly smart people
| are working via the humanoid aspect and I'd guess they'll have
| some discoveries before the simulation people, but not
| necessarily because they're going from the humanoid approach
| liuliu wrote:
| At certain point, training / inference difference will be
| blur. Is GPT-3's prompt some kind of "learning" or just a
| different way to poke the model?
|
| For RL, things like domain randomization, particularly, the
| way to train "teacher-student" network (like in MIT Cheetah
| running paper), when inference, it recovers physic parameters
| of the body, does that count as "learning"?
|
| The simulation may not have "banana peel", but for a multi-
| modality model, it is not hard to imagine it has encountered
| an object with similar physic parameters before and "other
| models" in the system, after "fall over", can recover such
| parameters and won't be tricked again. Does that count as
| "learning"?
| mattmee wrote:
| It all depends on how you define "learning".
|
| I was just saying that I think George is wrong to say "The
| only way forward is on device learning." Especially his use
| of the word "only". I think in order to interact in the
| real world, you will have to learn in the real world to
| some extent, but like pilots do in training,
| something/someone can learn a lot from simulations.
|
| Something to consider is that living creatures in general
| have hundreds of millions (billions?) of years and (insert
| very, very large number) of variations of trail and error
| in the real world. Intelligence and learning came way
| before humans.
| liuliu wrote:
| Oh, I am totally in the camp of "simulation is enough".
| My reply is a convoluted way to argue that once learned
| enough in simulation, adaptation in real-world should
| just work (if you treat adaptation as a way of learning).
| cloogshicer wrote:
| > What is the human reward function?
|
| Sentences like this scare me.
|
| The thought that something as nuanced, complex, diverse, and
| ever-changing as human desires, can be captured in a single,
| static "reward function" sounds ridiculous to me.
|
| And yet, people will try, and they will probably get pretty
| close. And then those whose desires do not fit neatly into that
| "reward function", will suffer.
|
| I can already see lots of suffering caused by this type of
| thinking, today.
| jdeaton wrote:
| Have you heard of the molecule dopamine?
| harperlee wrote:
| That's the reward mechanism, not the function, isn't it? You
| can release it by jumping off a cliff or reading a novel.
| ot wrote:
| This is very similar to the idea of embodied cognition:
|
| > Embodied cognition is the theory that many features of
| cognition, whether human or otherwise, are shaped by aspects of
| an organism's entire body. Sensory and motor systems are seen as
| fundamentally integrated with cognitive processing. The cognitive
| features include high-level mental constructs (such as concepts
| and categories) and performance on various cognitive tasks (such
| as reasoning or judgment). The bodily aspects involve the motor
| system, the perceptual system, the bodily interactions with the
| environment (situatedness), and the assumptions about the world
| built into the organism's functional structure.
|
| https://en.wikipedia.org/wiki/Embodied_cognition
|
| I believe this is a pretty mainstream concept in AI, also see the
| "AI and robotics" section in the page.
| carom wrote:
| I think his definition of AGI is off. It does not need human
| characteristics as it will be machine intelligence, not human
| intelligence. I think seeking AGI from a human-centric view will
| keep us searching in the wrong locations for things that feel
| human but are not generally intelligent.
|
| For General Machine Intelligence I would like to see research in
| the following areas -
|
| 1) Understanding of the hardware and software on which it runs.
| This will allow it to introspect and self improve at the lowest
| level. I think there is a lot of opportunities in research in
| applying our large language models to machine languages.
|
| 2) An internal representation of facts. While there is no
| guarantee that it would outwardly represent the truth, an
| intelligence must be able to discern between reality and
| hallucinations.
|
| 3) A probabilistic reasoning engine. Based on its priors about X
| and Y, find the probability of X -> Y. This could also aid in
| forgetting, as it could then generalize and discard individual
| facts.
|
| I really believe we should pivot to an idea of Machine
| Intelligence. Otherwise we will continue chasing metrics that
| make probabilistic models feel human, but not necessarily bring
| them to life.
___________________________________________________________________
(page generated 2022-09-04 23:01 UTC)