[HN Gopher] Why machine learning struggles with causality
___________________________________________________________________
Why machine learning struggles with causality
Author : bschne
Score : 131 points
Date : 2021-04-02 10:23 UTC (12 hours ago)
(HTM) web link (bdtechtalks.com)
(TXT) w3m dump (bdtechtalks.com)
| digikata wrote:
| I wonder if there are intermediary steps to getting to better at
| causality in ML. Causality is an abstraction of a whole set of a
| lot of different levels of problems.
|
| In terms of concrete problems earlier than causality. E.g.
| toddlers I think get object permanence before causality, and I
| think ML might struggle with that too.
|
| Edit: then the next interesting thing after permanence is maybe
| obj path prediction, then you have an interesting basis for some
| level of causality inference because you have a prediction and
| some set of conditions that might disrupt the prediction.
| max_ wrote:
| I think that would be to venture into complex systems & knowing
| of phenomenon like "causal opacity"
| BayezLyfe wrote:
| Better overview article on this topic: "AI Needs More Why"
|
| https://www.forbes.com/sites/alexanderlavin/2019/05/06/ai-ne...
| BayezLyfe wrote:
| and the follow-up, "Healthcare Needs AI, AI Needs Causality"
|
| https://www.forbes.com/sites/alexanderlavin/2019/08/13/healt...
| 6gvONxR4sf7o wrote:
| Here's my usual rant about ML not doing it because we don't give
| it the right data. You can't learn causation from a table of
| floats. You _can_ learn causation from a sufficiently annotated
| table of floats.
|
| If I tell you that here's a column of people's weights and a
| column of their diets, no model can learn the causal connection
| between the two. If I tell you that here's a column of peoples
| weights and here's a column of their diets _which were randomly
| assigned under intervention_ , then suddenly a model can do it.
|
| All causal inference with observational data requires assumptions
| about conditional independence structure. It's so crucial that we
| always explain it all in prose in any writeup of any given causal
| investigation. We put none of that in tables themselves, despite
| it being entirely crucial. If we started making postgres tables
| that stopped looking like "height: float, weight: float" and
| started looking like "height: float, weight: do(float)" (as in
| pearl's do(...)) then we could start to automate causal inference
| much much more easily.
|
| Not to say the types would be nearly so simple. You'd need a full
| DAG for your database, and even then, it's not that simple: our
| AB testing platform (v3.1) intervened here according to this 1k
| line python script (git commit 191284794) that took in these
| columns and employed a model trained on this other entire table
| as of date X, before we migrated the db. Also this one column's
| meaning changed in november when we removed a button from the
| home page.
|
| But without some structured encoding of the structure that an
| analyst is going to need (structure they're absolutely going to
| be encoding in natural language in their writeup), we're trying
| to do it with one arm tied behind our backs.
| PeterisP wrote:
| The article makes a big unfounded assertion in one of the first
| few sentences "We learn them at a very early age, without being
| explicitly instructed by anyone and just by observing the world."
| - we do _not_ learn these inferences "just by observing the
| world", we learn these inferences by _acting_ on the world and
| observing the results, and arguably they are simply not learnable
| by pure observation alone. There 's some theoretical grounding
| for that (IIRC Judea Pearl's book Causality might be a good
| source for that) and some experimental grounding for that (e.g.
| the two kitten experiment for which I can't quickly find the
| actual paper but it's described at https://io9.gizmodo.com/the-
| seriously-creepy-two-kitten-expe... ), also other fields like
| medicine have a good understanding of limitations of what
| knowledge can be gained from purely observational studies versus
| experimental interventions. So it seems a bit misleading to write
| a whole article about "why machine learning struggles with
| causality" without even addressing the key difference of learning
| from observations vs learning from interaction, which IMHO is a
| much more fundamental obstacle than everything the article
| menstions.
| Krabbyos wrote:
| citation for the two kitten experiment:
|
| Held, R., & Hein, A. (1963). Movement-produced stimulation in
| the development of visually guided behavior. Journal of
| Comparative and Physiological Psychology, 56(5), 872-876.
| https://doi.org/10.1037/h0040546
| skybrian wrote:
| Both the article and the paper mention interventions and refer
| to Pearl's work, so I think we can assume this is just a matter
| of poor phrasing and a child's observations of its own
| interventions aren't meant to be excluded.
|
| But for the purposes of machine learning, it seems like it
| should be possible to learn from observations of someone else's
| interventions?
| musingsole wrote:
| > we do not learn these inferences "just by observing the
| world"
|
| We most certainly can, just not as well or as strongly as when
| we're able to influence the system under observation. You're
| speaking way too strongly and simplifying a complex mechanism
| down past anyone's expertise.
| Jeff_Brown wrote:
| _After_ interacting with the world, we can learn from passive
| observation. No human has ever learned anything about
| causation without first spending a while experimenting in the
| world.
| dumbfoundded wrote:
| An interesting related idea is inverse reinforcement
| learning. We watch how other people interact with a system to
| estimate a reward function and then we later test it out
| ourselves. Either directly with an environment or inside our
| own mental model of the environment. Simply "observing the
| world" can give us data about how to learn in an environment
| we've never interacted with before; only by watching others
| interact with it.
| still_grokking wrote:
| That's true. But the key seems to remain the "interaction
| with the world".
| dumbfoundded wrote:
| Maybe it's semantics but interacting with a simulation of
| the world is actually more important. In a pure sense,
| this doesn't require any actual real world interaction.
| This concept is usually referred to offline and off-
| policy reinforcement learning.
|
| If you're saying that interaction in any sense is
| important, I'd very much agree that unsupervised learning
| and supervised learning aren't equipped to handle
| reinforcement learning problems. Correct framing of a
| problem is necessary to achieve a desired property like
| causality.
| jcims wrote:
| I think we have a hard time disconnecting our personal
| experience with the observations we make. When we look at a
| photograph of a person riding a bicycle down a path, even if
| we've never ridden a bicycle we've likely been outdoors,
| stood on a path, felt wind when we moved, etc. We may not be
| able to accurately simulate the experience in our mind but we
| can get close.
|
| On the other hand, the starting point for an ML system
| interpreting that same image is essentially a stream of
| scalar values that tend to demonstrate multiple layers of
| periodicity (3-4 byte intervals for RGBA and then another
| layer per line of rasterized image data and yet another per
| frame if it's a video).
|
| Here's a quick experiment. Let the video linked below play
| for a five count (sound is essential but a bit intense so
| maybe moderate the volume first) so you have some confidence
| it's not just me playing a rude trick, then close your eyes
| for a five count. There's going to be a major change in the
| sound when you get near 'five', now try to imagine how the
| scene changed before opening your eyes again:
|
| https://youtu.be/qnL40CbuodU?t=25
|
| I think without any reference from an embodied perspective,
| we're asking ML systems to understand the sounds (which are
| also streams of scalar values that demonstrate periodicity)
| the same way we interpret the representation visually.
|
| (Also if you enjoyed the example above check out these two
| channels, some of them are mindblowing)
|
| https://www.youtube.com/user/jerobeamfenderson1
|
| https://www.youtube.com/c/ChrisAllenMusic
|
| And a fun video explaining it all -
| https://www.youtube.com/watch?v=4gibcRfp4zA
| [deleted]
| fractionalhare wrote:
| I don't think they're speaking too strongly. I think a lot of
| the time when we correctly infer causality without
| empirically interacting with the system, it's because we have
| built up significant categorical experience about more atomic
| systems we were able to interact with.
|
| In my view, a lot of things that are noninteractively
| inferred are compositions of more fundamental things that
| required empirical experience. When you've had the causality
| of gravity thoroughly beaten into you at a young age, a lot
| of other things seem intuitive that would otherwise
| completely fall outside a framework for being unempirically
| learned.
|
| Do you have a specific counterexample of causality you can
| infer without interaction or empirical experience of
| something related?
|
| Caveats: I'm not a neurologist or psychologist, so this is
| mostly philosophical speculation on my part.
| visarga wrote:
| We think we're so smart because we causally understand the
| world, but it took us a very long time to collectively
| discover these principles. A human alone would not be so
| smart.
| still_grokking wrote:
| I think this confuses (fact) knowledge and the ability to
| recognize causal relations.
|
| A human can never be smarter than said human: Our brains
| are not connected and we can't share capacity with
| others.
|
| So, discovering causality is always an individual
| experience. And that happens likely by "playing" with
| "the world".
|
| I think it's noticeable that smarter animals are more
| playful. Which is also a hint that points to the
| fundamental importance of interaction with the world as a
| prerequisite for "smartness". Additionally the
| capabilities of the "sensors" and "actors" that make
| interaction with the world possible in the first place
| seem to be crucial to develop "smart behavior".
|
| The part about the "sensors" seems quite obvious. I think
| one can gain a better general understanding of some thing
| if one can "experience" it in more than one "dimension".
|
| And the "actors" allow one to perform "experiments" with
| the things around one, and find out this way how that
| thing "works" or is supposed to be "used".
|
| That's actually the behavior that can be observed in
| children of all kinds of "smarter" species. So it seems
| to be at least linked somehow to "smartness".
| musingsole wrote:
| Empirical evidence is a special case observation. If you
| observe the whole universe in its entirety, you could
| separate moments that effectively followed whatever
| conditions you might set in a lab. You can't act on the
| system, but you can forever tune your models to match the
| infinite observations you could record. At that point the
| separation between observation derived knowledge versus
| experiential knowledge is meaningless (it's just hard to
| imagine a universal model manifesting without having used
| experiental knowledge along the way).
|
| Whether you swing the bat or just watch it hit the ball
| into the sky, you have the prerequisites needed to reason
| about the interaction.
|
| A more entertaining question is how a system comes to
| _believe_ causality (i.e. comes to believe that things _can
| and must_ have causes)
| still_grokking wrote:
| > (it's just hard to imagine a universal model
| manifesting without having used experiental knowledge
| along the way)
|
| That's actually the point!
| shkkmo wrote:
| > We most certainly can
|
| That is a very, very strong statement that requires some
| proof to go with such a strong statement of certainty.
|
| While we can learn that the ball's change of movement just
| fine without swinging the bat, we can only do that because we
| are generalizing from a large body of knowledge that we
| devoloped by experimenting on the world using our body.
|
| I am not aware of a single piece of evidence that an agent
| can use purely observational learning to ever aquire the
| causal knowledge of the real world to sufficient level to
| make those sorts of inferences with any sort of reasonable
| accuracy.
| musingsole wrote:
| I'd link studies about children aping things they have only
| seen through a window, but I suspect you'll quickly argue
| out of the bounds of those studies and so I won't.
| galaxyLogic wrote:
| Right. "World" includes us. When we perform actions we
| observe ourselves doing them and then we observe the
| consequences.
|
| That is the only way we can learn anything, by observing. And
| what else is there to observe than the "world"? We can
| observe our own thinking process but that is part of the
| world too. I would say. It is definitely not "out of this
| world" :-)
| visarga wrote:
| yes, we learn causal reasoning by observing consequences,
| we can't learn by simply watching without acting
| [deleted]
| jfengel wrote:
| I suspect that we have some of it built in, even before we
| observe or act. Something in us starts us on the road of the
| hypothesis that time is a thing we can reason about.
|
| The illusion of causality is incredibly strong -- so much so
| that it's really hard to get people to give it up, even when
| faced with paradoxes (First Mover, quantum mechanics, multiple
| causation, etc.)
|
| We don't come into the world with a fully formed sense of
| causality, and it appears over the course of months. But at
| least some of it may be wired into the hardware, like the
| language instinct. It's just that the wiring isn't done the day
| we leave the womb, and is influenced by what comes after.
| t0mas88 wrote:
| Indeed, but it comes really early in development. A 2 months
| old baby understands that if you smile at other people they
| will interact with you. And they'll try to use the to get
| more interaction / playing (in a very simple way) which also
| makes sense evolutionary because more attention from adults
| means higher chance of survival.
| JulianMorrison wrote:
| TBH the idea that humans learn _anything_ on a blank slate pure
| learning architecture is messed up. Our brains are literally
| evolved to interpret causality into the world. It isn 't
| randomly just "learned" any more than walking is.
| andyxor wrote:
| exactly, besides ignoring the innate structures, heuristics
| and biases hardcoded via evolution, the whole notion of
| "learning" became highly intertwined with reinforcement kind
| of learning, i.e trial & error, stimulus and response
| behaviorist terms popularized by Pavlov and Skinner a century
| ago, which is just one type in a large repertoire of
| adaptation mechanisms.
|
| Memory in these models is used as afterfact, or some side
| utility for complex iterative routines based on calculus of
| function optimization. While in living organisms memory and
| its "hardcoded" shortcuts allow to cut through the search
| space quickly as in a large database index.
|
| Speaking in database terms we have something like
| "materialized views" on acquired and genetically inherited
| knowledge, built from compressed and hierarchically organized
| sensory data and prior related actions and associations,
| including causal links. Causality is just a way to associate
| items in the memory graph.
|
| Error correction doesn't play as much role in storing and
| retrieving information and pattern recognition, as current
| machine learning models may lead you to believe.
|
| Instead, something akin to self-organized clustering is going
| on, with new info embedded in the existing "concept" graph
| via associations and generalizations, through simple LINK and
| JOIN mechanisms on massive scale.[1] The formation of this
| graph in long term memory is tightly coupled with sleep
| cycles and memory consolidation, while short term memory
| serves as a kind of cache.
|
| Knowledge is organized hierarchically starting from principal
| components [2] of sensory data from e.g. visual receptive
| fields, and increasing in level of abstraction via
| "chunking", connecting objects A and B to form a new object C
| via JOIN mechanism, or associating objects A and B via LINK
| mechanism. Both LINK and JOIN outputs are "persisted" to
| memory via Hebbian plasticity.
|
| All knowledge including causal links are expressed via this
| simple mechanism. Generating a prediction given a new sensory
| signal is just LINKing the signal with existing cluster by
| similarity.
|
| Navigation in this abstract space is facilitated via
| coordinate system similar or perhaps identical to the role
| hippocampal place & grid cells play in spatial navigation.
| Similarity between objects is determined as similarity
| between their "embeddings" in this abstract concept space.
|
| It's possible that innate structures are genetically pre-
| wired in this graph which represent high level "schemas",
| such as innate language grammar which distinguishes e.g. verb
| from noun, visual object grammar which distinguishes "up"
| from "down", etc. It is also possible these are embodied,
| i.e. connected to some representation of motor and sensory
| embeddings. And serve to bootstrap the graph structure for
| subsequent knowledge acquisition. I.e. no blank slate.
|
| The information is passed, stored and retrieved via several
| (analogue) means both in point-to-point and broadcast
| communication, with electromagnetic oscillations playing
| primary role in synchronization in neural assemblies,
| facilitating e.g. speech segmentation (or boundary detection
| in general), and coupling an input signal "embedding" to
| existing knowledge embeddings in short term memory; while
| neural plasticity/LTP/STDP as storage mechanisms on single
| neuron level.
|
| [1] See Leslie Valiant "neuroidal" model and his book
| https://www.amazon.com/Circuits-Mind-Leslie-G-
| Valiant/dp/019...
|
| [2] See Oja Rule
| http://www.scholarpedia.org/article/Oja_learning_rule
|
| and Olshausen & Field classic work on sparse coding
| http://www.scholarpedia.org/article/Sparse_coding
| danans wrote:
| > we learn these inferences by acting on the world and
| observing the results, and arguably they are simply not
| learnable by pure observation alone.
|
| Tangentially, wouldn't sampling based ML methods like particle
| filters / kalman filters or other randomized state space
| exploration algorithms be analogous to the person learning by
| acting on the world? In this case, the "action" would be
| bouncing the radar off the object being tracked.
|
| Of course these models are far more limited than a child in the
| way they can act on the world, and in the number of aspects of
| reality they can model.
|
| And furthermore, they have no concept of causality, and
| represent only the current state of knowledge they are
| modeling.
| mcguire wrote:
| " _...we do not learn these inferences "just by observing the
| world", we learn these inferences by acting on the world and
| observing the results..._"
|
| The article: " _"Machine learning often disregards information
| that animals use heavily: interventions in the world, domain
| shifts, temporal structure -- by and large, we consider these
| factors a nuisance and try to engineer them away," write the
| authors of the causal representation learning paper. "In
| accordance with this, the majority of current successes of
| machine learning boil down to large scale pattern recognition
| on suitably collected independent and identically distributed
| (i.i.d.) data."_ "
|
| The key words are "interventions in the world." The article
| goes on to say, " _"Generalizing well outside the i.i.d.
| setting requires learning not mere statistical associations
| between variables, but an underlying causal model," the AI
| researchers write._ " The point being that, whether or not
| acting in the world is an essential condition for learning
| causality, current machine learning approaches are not even
| trying for causality.
| darksaints wrote:
| This is something I discussed with a friend recently. I talked
| about how the key to unlocked more potential for AI was to stop
| sandboxing it away from the world and let the AI start
| interacting with it. And the immediate pushback to that is that
| that would cause immediate chaos. Can you imagine AI-driven
| cars that out of nowhere decide to brake-check just to see what
| happens?
|
| Kids are often an unpleasant annoyance in restaurants, and many
| people that don't like that annoyance try to convince
| restaurants and lawmakers to ban them from restaurants. The
| problem with those ideas is that by banning kids from
| restaurants, you are just going to create annoying adults in
| restaurants over time. Kids are annoying in restaurants, but
| they are also learning how to interact with the world. If you
| don't find a way to let them explore boundaries, they never
| learn, and they'll become obnoxious restaurant patrons even as
| fully grown adults.
|
| Which kind of goes back to ethical AI. You can't unleash
| unbounded AI on the world, or else you'll cause chaos. And you
| can't sandbox AI, or it will never truly learn. What are you
| supposed to do then? I don't know, but the answer isn't firing
| the ethical AI department because you don't want them
| criticizing your ad empire ;)
| Radim wrote:
| Makes you wonder how life operated near the beginning, before
| "how to behave sustainably" evolved. Before individual death,
| before sex, before speciation, before children, maybe even
| before genes.
|
| The world must have seen some wild, explosive action in its
| day.
|
| Immortal organisms still exist (two-headed planaria!), but on
| the whole the ecosystem seems pretty well calibrated by now.
| bserge wrote:
| Death upon death upon death, it seems.
| klyrs wrote:
| Only, AI doesn't even have object permanence, something that
| babies pick up in a few months, years before they can make
| causal inferences longer than a few seconds. We don't let
| babies drive cars, we give them toys that they can't hurt
| themselves with. We only allow that when they're smart enough
| to learn from verbal/written instruction, have the fine motor
| skills to operate a vehicle, and the situational awareness to
| operate a car safely. People making self driving cars are
| basically putting infants in the drivers seat.
| bserge wrote:
| There are a lot of simulators out there. It would not be out
| of the realm of possible to set up a learning AI to play them
| over and over again and record the results.
|
| I think besides the sheer amount of work put in to program
| something like that, the main limitation would be processing
| power, that sort of thing would take an immense amount of it.
|
| Now that I think about it, isn't this exactly what Tesla and
| other self-driving companies is trying to do?
|
| So, it's not like it's sandboxed, it's just very hard to make
| this "kid" play with things.
| cpleppert wrote:
| Except AI isn't being 'sandboxed away' in any discernible
| way. A child being an unpleasant annoyance isn't comparable
| to a an AI that can't drive in either risks of failure or
| capability because the child is far more capable of
| moderating his behavior to conform to social norms than an
| AI.
| beaconstudios wrote:
| The broader theoretical basis behind this idea is called
| embodied cognition.
| 6gvONxR4sf7o wrote:
| > we learn these inferences by acting on the world and
| observing the results, and arguably they are simply not
| learnable by pure observation alone.
|
| Also crucially, we learn these inferences by acting on the
| world and knowing something about why we acted. "I was playing
| with it" is a conditional independence statement that we use
| all the time while learning how things work, we just usually
| don't use the mathy language to describe it. We're running
| randomized controlled trials constantly, but implicitly.
|
| Coincidentally, it's a common anecdote that people who seem to
| learn things quickly and deeply have this curiosity and will
| play with a thing/twiddle the knobs while they're learning how
| it works. When you're playing a video game and someone says
| "hang on, let me figure out the controls for a sec" they're
| changing the conditional independence structure of their
| observations and running an RCT.
| [deleted]
| _0ffh wrote:
| Another thing which is often forgotten, we have evolutionarily
| adapted biases built into our learning machinery. That helps us
| to learn some things that tended to be essential for survival
| (much) more quickly, but it can also hinder us in learning some
| other things that we are not adapted for.
| jonplackett wrote:
| I saw a study a while back that kids only a few months old will
| preferentially go to a person who helped their mum open a jar
| rather than one who refused.
|
| There's a hell of a lot of observation happening in kids minds
| _very_ early on.
|
| No doubt it's easier once you can pick up a baseball bat
| yourself, but I have no doubt a young kid would understand
| basic objects connected together without ever having used them.
| MeteorMarc wrote:
| Of course it makes a difference whether you observe someone
| acting or you are the actor yourself, but why could not you
| learn from someone else acting and observing the results?
| karpierz wrote:
| You can think of "acting on the world" as resolving questions
| about the world, like "what happens when I do X"?
|
| You could observe the results of others acting, but it means
| that the questions you're getting answers to are outside of
| your control. So if you need to know the answer to a
| particular question, you either need to test it yourself, or
| hope that whoever you're watching will test it for you.
| t0mas88 wrote:
| It's possible to learn from someone else acting and observing
| the results only if you understand what the observed person
| is doing and could map it to yourself. Which babies for
| example cannot do. They like you making crazy faces but when
| very young they don't learn how to do the same thing from
| that. I think for two reasons, first they don't know what you
| did to make that face (which in the "experimenting yourself"
| case they would have known even if it was a random action)
| and second they can't map what they see to actions with their
| own body. Both need to be learned first before you can learn
| from observation. While learning from your own (random)
| interactions with the world is possible from day 1.
| jasonwatkinspdx wrote:
| Yeah, Judea Pearl's book is the definitive work. His formalism
| also has an aspect that you could say is similar to imagining
| an intervention in a counterfactual way, and computing its
| likelihood given the data. His paper "Why I'm only half
| Bayesian" lays out how he sees the epistemology of this.
| mlthoughts2018 wrote:
| This has been discussed widely in computer vision for many
| decades
|
| - https://www.routledge.com/The-Ecological-Approach-to-
| Visual-...
|
| - http://fmdb.cs.ucla.edu/Treports/soatto_extended_v18.pdf
| Der_Einzige wrote:
| I disagree here. We DO "learn them at a very early age, without
| being explicitly instructed by anyone and just by observing the
| world" the majority of information that a human ingests doesn't
| include direct actions upon the world (in any more than a
| philosophical sense anyway). We may feel as though actions are
| more memorable, but I claim that we observe than take actions,
| not the other way around. We learn things all the time that are
| not "consciously learned". Addition (e.g. 2 apples are more
| than one apple) is learned by most humans in an unsupervised
| manner far before labels are added later. I claim that early
| childhood development and thus much of our "foundation" is
| primarily rooted in having no explicit information AND not
| enough power to directly label data ourselves (through
| experiments, play, etc)
|
| This is important because "learning without explicit
| instructions" in ML speak is _Unsupervised learning_
| (clustering and dimensionality reduction). There are no labels
| except the ones that you decide upon yourself (cluster
| membership). Unsupervised Learning is still far in its infancy
| in effectiveness compared to supervised systems, and it 's no
| surprise that its algorithms are generally extremely easy to
| implement from scratch (e.g. K-means or DBSCAN) compared to
| relatively difficult work like automatic-differentiation in
| neural networks.
|
| Learning by reading information in a book or by direct didactic
| teaching would be supervised learning. Learning through a
| dialectical format would be reinforcement learning . Self-
| supervised learning would be equivalent to autodidactic
| learning and the creative act upon the world. (maybe the
| distinction between self-supervised and reinforcement is
| arbitrary)
|
| The point is that we want to learn as much as we can given the
| information available to us. We should not rule out the role
| that the biological analogue to unsupervised learning plays in
| human development.
|
| It is for all of these reasons that I become far more excited
| when a new clustering or dimensionality reduction algorithm
| comes out than I am when a new neural network architecture
| becomes state of the art.
| shkkmo wrote:
| Children do not learn completely "unsupervised" and recieve
| frequent feedback from the agents. I would argue that a
| significant amounts of childhood development, (especially
| around "labels"), is due to our hardwired attachment to human
| faces.
|
| I have always felt that the signifance of the "social
| software of human culture" in our general intelligence and
| learning capacity was underestimated by the AGI community.
|
| So personally, I see more potential in communities of
| learning agents than any developments in the underpinnings.
| meroes wrote:
| Not saying you are wrong but I learned to ride a bike by
| watching my classmates be taught in preschool.
|
| Maybe direct acting is a quicker learning method. Or maybe
| seeing others' learning processes and instructions allows one
| to leapfrog ahead by not duplicating mistakes.
|
| Maybe always learn off a good dataset if it exists first?
| didibus wrote:
| > I learned to ride a bike by watching my classmates be
| taught in preschool
|
| I'm confused, you're saying you watched others learn and then
| managed to get on a bike for the first time and properly ride
| it like an experienced rider with no practice whatsoever?
| beaconstudios wrote:
| First, you make a mental model. Then, you test it.
|
| If you don't test it then you don't know if it works, and you
| can't improve on it.
|
| Knowledge is derived from experience.
| musingsole wrote:
| Scoring a model's predictions are a means of testing the
| model without having it coupled to the system under
| observation.
|
| Observation is a type of experience.
| Jtsummers wrote:
| And you can gain experience/understanding without needing
| to either observe or directly experience something,
| merely by thinking about it or being told about it. Not
| every kid has to be or see a burn caused by touching a
| hot stove to learn it's a bad idea to try it.
|
| If you require actual experience or direct observation to
| learn, then you're not using your brain to its full
| potential.
| shkkmo wrote:
| You can certainly generalized previously gained knowledge
| without direct experience in that particular instance.
|
| Would a child who has never experienced the human body's
| pain response be able to infer the causal connection
| between the heat of a stove and the response after
| touching it?
|
| Arguably, language is a tool that allows us to generalize
| the direct experience of other agents. It is unclear if
| it is possible to remove direct interaction from a
| learning system and still reach the same level of
| understanding.
| beaconstudios wrote:
| I'm not sure what you mean by "without having it coupled
| to the system under observation" - could you clarify?
|
| I do agree that observation is a type of experience, but
| a model that is meant to guide action (basically any
| useful model) needs to be tested in action. I can't learn
| to juggle only by watching other people juggle. I can
| only develop a hypothesis about how one juggles, but to
| test (and refine) it is to try the hypothesis out.
| musingsole wrote:
| A model being coupled to a system == a model that can
| influence the system's state through _a means_
|
| > a model that is meant to guide action (basically any
| useful model) needs to be tested in action
|
| No, it doesn't. For example, the vast majority of work on
| modeling the stock market is done on machines completely
| sandboxed from any ability to make trades and are owned
| by companies who will never make a trade themselves but
| instead return an API response with a yes/no. Whether
| that is fed directly into some sort of automated action
| is largely irrelevant as the ability for an individual
| trade to cause a measurable impact on the market is
| negligible until it isn't. So, these systems are built
| separate from the system they model and learn entirely
| through observation.
|
| tl;dr: weather forecasting models don't have an action to
| take and also can't influence their system. And yet they
| learn and grow more accurate.
| beaconstudios wrote:
| OK that's a fair criticism. Then perhaps we can divide
| models into those that influence the system they observe
| (regulatory systems) versus those that only measure, or
| whose influence is negligible. Models that aim to
| influence a system do indeed need to be used to test
| their efficacy.
| shkkmo wrote:
| It's not just about testing their efficacy it's about the
| theoretical limits of pure observation when doing causal
| reasoning. We know that we are better served by avoiding
| causal certainty when using purely observational studies.
| It seems like the base assumption should be that similar
| epistemic constraints apply to machine learning.
| beaconstudios wrote:
| Yes the best way to understand a system is to interact
| with it. But there are scenarios where that simply isn't
| possible and yet we can still model causality, like the
| weather example musingsole gave.
| cpleppert wrote:
| > First, you make a mental model. Then, you test it.
|
| Except that isn't how that really works right? A mental
| model explains part of the reasoning that led to an outcome
| but never all. After all, the map is not the territory. A
| mental model is grounded in trivial assumptions.
| Ultimately, your brain produces inferences in a way that is
| incredibly hard to couple to a specific logical processes.
| There has been a lot of research on how experts think and
| reason and none of it is compatible with having a mental
| model whatsoever.
| beaconstudios wrote:
| A mental model is fundamentally a predictive tool. It
| doesn't describe reality, it just allows us to make
| educated guesses about what will happen.
|
| I'm a constructivist so I'm under no assumption that we
| have access to the territory at all except modulated by
| our subjective perception.
| BenoitEssiambre wrote:
| I've read Pearl's book too and it wasn't clear if it was
| possible to systematically "observe interventions" instead of
| doing them yourself and what the rules would be for that.
|
| I mean there was some stuff about instrumental variables but it
| seems the theory is a bit incomplete in that area.
|
| What distinguishes an intervention vs just normal observable
| randomness? Does it have to do with the complexity of the
| entity performing the "intervention", with the fact that this
| entity can observe and act on knowledge? I guess it's kind of
| the debate about where determinism ends and free will begins.
| Are there mathematical bounds to help us sort it out though?
| Maybe there is something information theoretic? It's very
| unclear in my mind.
|
| This is all about fitting generative models that are robust to
| counterfactual changes, that remain predictive even if you run
| your models with data you've never observed, this beyond simple
| interpolation/extrapolation. Are there priors in model
| structure that tend to naturally make models much more robust
| to counterfactual changes and that make models work well beyond
| the data? Do these priors get more effective when they include
| some latent variables that distinguish between observations and
| interventions? How to you train these latent variables?
| GistNoesis wrote:
| Formally known as "do-calculus" which helps solve the
| correlation is not causation issue which usually affect
| statistical methods (like ML).
| galaxyLogic wrote:
| Isn't it simple. There is no single root-cause ever. What causes
| something is elementary particles moving in certain ways together
| always affecting each other. There is never a single "root
| cause".
|
| A separate question is "Who is guilty?". "Who deserves credit?"
| pas wrote:
| Highly recommended related essay/paper:
| https://www.gwern.net/Everything (Very terse and information
| dense though.)
| nafizh wrote:
| Arguably one reason causality research in the machine learning
| community hasn't boomed is there is no framework/ease of access
| to quickly code up the current heuristics/patterns/graph models
| like you can do with a deep learning idea using pytorch. Deep
| learning has reached today's stage because of early frameworks
| like Theano and Caffe. Access to beginners in a field is crucial
| for SOTA development which although feels a little counter-
| intuitive is nevertheless true. If you search for causality you
| get a bunch of books and papers from Pearl and Scholkopf which
| are fun for reading but what do I do something actionable with
| that quickly.
| Der_Einzige wrote:
| This is the correct answer. The masses will not tolerate buggy
| research code.
| joe_the_user wrote:
| _Arguably one reason causality research in the machine learning
| community hasn 't boomed is there is no framework/ease of
| access to quickly code up the current heuristics/patterns/graph
| models like you can do with a deep learning idea using
| pytorch._
|
| Ironically enough, it seems like you're confusing cause and
| effect here. The reason that there's little causation based
| reasoning isn't because there's no automation for it. Rather,
| the reason there's no automation for it is because it hasn't
| boomed.
|
| The reason deep learning based ML for image recognition boomed
| is because you could take a fairly database of images and
| categorizations and produce an impressive and testable system
| using straight forward if challenging optimization procedures.
| Because this approach has boomed, huge amounts of money have
| flowed to it, lot of people have been hire, and it's been semi-
| automated and so you have a combination of data and frameworks
| that let you things quickly. Some high percentage of all the
| achievement of current ML is leveraging the original static
| ability to sort images (or sort buckets of bits) into different
| areas (Alpha go - sorts moves into "good" and "bad" and adds
| tree pruning, etc). Which isn't to discount it, it's the first
| sort of system can seem "as good as human" in certain areas.
|
| But when there's no similarly easy and impressive procedure for
| taking, say, a time series, and getting the next result better
| than human or traditional statistics can predict, there's no
| boom, no gathering of public data sets, no easy automation of
| the standard procedures and so-forth.
| nafizh wrote:
| Theano started around 2007, long before DL got popular or the
| Imagenet competition where DL outperformed traditional
| methods by a long margin.
| p1esk wrote:
| The 2012 Imagenet results which jumpstarted DL did not
| benefit from Theano or Torch frameworks. Alex Krizhevsky
| had developed his own GPU accelerated framework (cuda-
| convnet), and it remained quite popular for a couple of
| years after the competition, until Theano and Torch caught
| up with it.
| ZephyrBlu wrote:
| Funnily enough I tried to look into Causal Analysis because I
| thought it might be applicable to something I'm working on.
|
| What I found was exactly like you said, a bunch of theory which
| was kind of interesting but didn't seem very practical at all.
|
| Lots of DAG manipulation without actually explaining how to
| gather data and model a DAG yourself.
| ironmantissa wrote:
| Judea Pearl also wrote a great layman book called "The Book of
| Why" that I highly recommend.
| zipotm wrote:
| Because programmers or project engineers are idiots. That's it.
| darksaints wrote:
| Humans struggle with causality. Even those whose professions are
| dedicated to understanding causality struggle with it. That's the
| reason we have heuristics like the "5 whys" that only
| occasionally work. Consider the following philosophical problem:
|
| An inattentive headphone-laden jaywalker wearing black crossed a
| road at night and was killed by a drunk driver speeding in a
| large SUV. What was the root cause?
|
| The amusing thing about this question is that if you actually
| have an answer, it reveals more about your biases than it does
| about the situation and potential solution(s). Various people
| will chime in about the latest thing that annoys them...whether
| it is inattentive pedestrians, jaywalkers, pedestrians wearing
| black at night, drunk drivers, speeders, or people driving cars
| that are too big and dangerous. But all of them are wrong because
| there is no discernible root cause.
|
| The problem is that we don't have multiple realities in which we
| can control each factor involved. And even if we did, it is
| entirely possible that if you can isolate and control each factor
| individually, that the accident still wouldn't have happened.
| Sometimes it takes a confluence of factors for an event to
| actually occur. And people's obsession with finding a single
| cause for complex phenomena hinders their ability to actually
| find fixable solutions.
| User23 wrote:
| > The amusing thing about this question is that if you actually
| have an answer, it reveals more about your biases than it does
| about the situation and potential solution(s). Various people
| will chime in about the latest thing that annoys them...whether
| it is inattentive pedestrians, jaywalkers, pedestrians wearing
| black at night, drunk drivers, speeders, or people driving cars
| that are too big and dangerous. But all of them are wrong
| because there is no discernible root cause.
|
| Causality is usually complex and often complicated. Let's
| consider another case:
|
| An attentive person without any sensory impairment crossed at a
| crosswalk in broad daylight and was killed by a sober bicyclist
| going ten miles an hour[1]. What was the root cause?
|
| All of the same objections apply. If you want an ultimate root
| cause you need to turn to theology, in which case as a
| Christian I can say that all death is ultimately caused by sin.
| While I personally find that to be a philosophically sound
| position, it isn't especially useful for answering the
| pragmatic question of how to reduce preventable evils like
| (some?) pedestrian deaths. My personal bias in this case is to
| take a pragmatic approach. That means identifying factors that
| can be changed with a high rate of compliance at a cost that is
| less than that of the evil being prevented.
|
| Mature safety systems take this into account. As an example,
| take basic firearms safety[2]: 1) Treat all guns as if they are
| loaded. 2) Never point a gun at anything you don't want to
| destroy. 3) Keep your finger off the trigger until ready to
| shoot. 4) Be sure of your target and what's behind it. In order
| to negligently discharge a firearm and cause harm, all four of
| these rules must be disregarded. In this case, even though any
| number of events from mechanical failure to muscle spasm could
| have caused the discharge, if someone causes unintentional harm
| I can say the local root cause was failing to observe the
| safety rules and hold said person morally responsible[3]. Many
| other examples of safety systems should come to mind, including
| relevant to the pedestrian safety scenarios.
|
| [1] One way that this could plausibly happen is that the
| pedestrian is knocked over and hits his head. Falling and
| hitting one's head is a surprisingly common way to die.
|
| [2] There are variants, but they all aim to achieve safety in
| much the same way.
|
| [3] This holds regardless of one's position on civilian
| ownership of firearms, because police and military personnel
| are also human and need to follow a safety system.
| viklove wrote:
| > If you want an ultimate root cause you need to turn to
| theology
|
| No, you do not "need" to turn to theology. There are plenty
| of explanations that do not involve invoking myths.
|
| > all death is ultimately caused by sin
|
| So the pedestrian was secretly a pedophile? That's a pretty
| strange explanation...
|
| I'd say in this case the blame likely rests on city planners,
| for creating situations in which travelers can collide.
| beaconstudios wrote:
| The parent is clearly invoking the "uncaused cause" model
| of God. Physicalism doesn't have an originating cause
| except maybe the big bang depending on what you think could
| have happened before it.
|
| So your options in terms of the originating cause are
| either to turn to religion for a philosophical answer, or
| to say "I don't know and we may never know".
| scrollbar wrote:
| We can disagree and still be nice to each other.
| NovemberWhiskey wrote:
| The problem isn't the lack of multiple realities for
| counterfactual testing - it's that the idea that there's a
| basic root cause for any particular outcome is ill-founded.
|
| The basic idea of 'look beyond immediate causes' is reasonable,
| but the cult of the root cause analysis is a bit out of
| control.
| jerf wrote:
| Perhaps you'd be happier with an idea like "when looking at a
| bad outcome, invest some effort looking at the proximal
| causes of the bad outcome to see if there's a place you can
| invest less net effort to fix the problem and for greater
| gain."
|
| Obviously all root cause analysis terminates at "Because the
| Big Bang and subsequent quantum fluctations had this result"
| or something similarly utterly unactionable, but if you use
| the metric above, it is a common observation that such
| analysis can reveal higher bang-for-the-buck engineering
| outcomes than simply fixing the immediately obvious, and that
| there is some typical patterns that emerge such as the root
| cause analysis eventually getting back into things that are
| infeasibly expensive or impossible to fix (e.g. "because
| human culture" will show up in a lot of them at some point
| but you aren't going to fix that just because two holes were
| misaligned on the factory line), meaning such analysis also
| meaningfully terminates.
|
| I tend to operate on this myself because you don't actually
| get "a" root cause analysis. I can always find a tree of
| causes as I go back, not "a" series of causes. But it's a
| fairly frequent occurrence that if you look over such a tree,
| there's at least one node with a highly favorable
| cost/benefit tradeoff that you can find with surprisingly
| minimal effort.
| joe_the_user wrote:
| Causality overall is hard. Humans fail at dealing fully with
| casual perfectly fairly often. But human tend to do far better
| than computers (partly failing rather than totally failing,
| etc).
|
| But your examples involve human's linguistic expression of
| casualty, which is an entirely different question.
|
| _The amusing thing about this question is that if you actually
| have an answer, it reveals more about your biases than it does
| about the situation and potential solution(s). Various people
| will chime in about the latest thing that annoys them...whether
| it is inattentive pedestrians, jaywalkers, pedestrians wearing
| black at night, drunk drivers, speeders, or people driving cars
| that are too big and dangerous._
|
| A lot of human language is "socially significant noise", which
| some might object usually isn't true. But it often serves an
| entirely different purpose than an accurate modeling of
| reality.
|
| People walk through complex society, putting their socks on
| before their shoes and otherwise doing the basic things but
| articulating positions that ... _other people_ would consider
| "nuts" but this situation has nothing to do with a failure of
| casual modeling, which happens on a different logical level
| entirely.
| jjtheblunt wrote:
| what an excellently worded observation; hadn't thought clearly
| of this before, though noticed the idea vaguely
| skybrian wrote:
| The article is about much simpler forms of causality though,
| like what happens when you hit a ball with a bat. Learning
| enough about causality to handle everyday physics would be an
| important advance.
| bserge wrote:
| Funnily enough, this is what our "collective consciousness" is
| useful for. Humans act a lot like an ant colony, except
| individuals have much more autonomy.
|
| So, you ask, say, 1000 people what the root cause is. The
| result will be decided either by majority or the most plausible
| argumentation. Say 700 people agree the jaywalker was at fault.
| That will become "reality" for the group, which will then
| likely spread throughout the hive.
|
| The main lesson will be "don't jaywalk at night", the secondary
| one likely "don't wear black clothing at night" and probably a
| third one "beware of drunk drivers".
|
| Sorry if that sounds weird, I'm also trying to wrap my head
| around what human intelligence is and how it can be applied to
| AI (Approximate Intelligence :)).
| ZephyrBlu wrote:
| Examples like this are how I realized that almost everything
| is subjective.
|
| Things we think are objective are usually subjective things
| that we decided to agree on.
| bserge wrote:
| Yeah, I'd say that's true of most things that are too
| complicated to be understood by a single individual.
|
| In the end, it's all about the collective. It needs a
| consensus to move forward, and that doesn't need to be
| perfect, just good enough.
|
| The whole human civilization works mostly on subjective
| conclusions, sometimes a minority brings up enough
| facts/proof to change the established consensus, but often
| we just pile layers upon layers on top of things that are
| not objectively true or not completely true. But they're
| good enough, and we're very adaptable.
| kempbellt wrote:
| Ironically, I would argue that just about everything that
| happens is an _objective_ event, but any and every
| interpretation /understanding of causality is itself,
| subjective.
|
| In the hypothetical scenario the objective reality is: A
| pedestrian and a car collided.
|
| Other factors come into play, which all have potential
| influence on the causality of this objective reality, and
| said factors are frequently _subjectively_ asserted as more
| relevant to causality than others.
|
| Strong assertions to causality may include: Driver was
| drunk, pedestrian was jaywalking, pedestrian was wearing
| dark clothing, driver was speeding, driver was texting,
| driver and pedestrian had a prior altercation, driver was
| tired, etc
|
| Weaker assertions to causality may include: It was a
| Tuesday, someone sneezed 3 miles away, a butterfly flapped
| its wings, pedestrian knocked over salt during dinner
| earlier, etc
|
| Causality is hard... We're getting better at it, but
| superstitions _are_ a thing that exist, even if they are a
| bit odd.
| tremon wrote:
| _What was the root cause?_
|
| That's easy: the root cause is the roadworks two streets over,
| which caused the driver to divert from his usual route.
| ineedasername wrote:
| This also covered very nicely by Daniel Dennet in his robot/bomb
| paper. It's a great read (warning, pdf)
|
| https://folk.idi.ntnu.no/gamback/teaching/TDT4138/dennett84....
| ineedasername wrote:
| ^^ If you really want to understand the issue from both the
| historical AI research & philosophical perspective, this is the
| article to read, and has the benefit of being an entertaining
| read. Much of Dennet's work is similarly entertaining, and he's
| absolutely brilliant.
|
| (sorry for commenting on my own comment, I should have added
| this detail in the first but was too late to edit.)
| brindlejim wrote:
| You could argue that reinforcement learning policies are already
| causal models insofar as they relate state-action pairs to the
| rewards and penalties that they lead to. The trial and error that
| RL performs in a simulation is an exploration of counterfactuals
| to establish cause.
|
| But the policies lack introspection. One of the most powerful
| things we could do is somehow extract causal models from those
| policies, to see what they learned that lead them to behave more
| intelligently. That would increase both our knowledge as well as
| our trust in applying RL.
| PartiallyTyped wrote:
| There's a paper that used a random convolutional filter when
| training the agent and they found that it manages to generalize
| very well, they evaluated the CNN and found that the model put
| emphasis on where the enemies were at each frame, which
| indicates some form of understanding.
|
| However, I don't think that there is any form of causal
| relationship to be extracted for model-free agents. I don't
| believe that what we are seeing is anything more than changing
| action likelihoods in some very high dimensional function.
| vsskanth wrote:
| How well do differential neural nets perform in casual inference
| ? There seems to be a pretty good library from sciml that claims
| to learn models from limited data but wondering if they
| generalize well
| adolph wrote:
| A Brief Overview of Causal Inference (covers Pearl and others)
|
| https://tjohnson250.github.io/overview_causal_inference/over...
| nerdponx wrote:
| This was a surprisingly well-informed article. It covers the
| i.i.d. assumption, in-sample vs out-of-sample data, and causal
| modeling. And it avoids "AI" hyperbole.
| Meniteos4 wrote:
| >For instance, convolutional neural networks trained on millions
| of images can fail when they see objects under new lighting
| conditions or from slightly different angles or against new
| backgrounds.
|
| The big picture is humans use a multi-task network for depth,
| segmentation (and background removal), lighting source estimation
| (and shadow removal), material extraction, SLAM (and geometry
| reconstruction), optical flow, etc. Papers and their networks
| only look at a small part of what humans do; we are not just
| using a single "neural network."
___________________________________________________________________
(page generated 2021-04-02 23:01 UTC)