[HN Gopher] Why machine learning struggles with causality
       ___________________________________________________________________
        
       Why machine learning struggles with causality
        
       Author : bschne
       Score  : 131 points
       Date   : 2021-04-02 10:23 UTC (12 hours ago)
        
 (HTM) web link (bdtechtalks.com)
 (TXT) w3m dump (bdtechtalks.com)
        
       | digikata wrote:
       | I wonder if there are intermediary steps to getting to better at
       | causality in ML. Causality is an abstraction of a whole set of a
       | lot of different levels of problems.
       | 
       | In terms of concrete problems earlier than causality. E.g.
       | toddlers I think get object permanence before causality, and I
       | think ML might struggle with that too.
       | 
       | Edit: then the next interesting thing after permanence is maybe
       | obj path prediction, then you have an interesting basis for some
       | level of causality inference because you have a prediction and
       | some set of conditions that might disrupt the prediction.
        
         | max_ wrote:
         | I think that would be to venture into complex systems & knowing
         | of phenomenon like "causal opacity"
        
       | BayezLyfe wrote:
       | Better overview article on this topic: "AI Needs More Why"
       | 
       | https://www.forbes.com/sites/alexanderlavin/2019/05/06/ai-ne...
        
         | BayezLyfe wrote:
         | and the follow-up, "Healthcare Needs AI, AI Needs Causality"
         | 
         | https://www.forbes.com/sites/alexanderlavin/2019/08/13/healt...
        
       | 6gvONxR4sf7o wrote:
       | Here's my usual rant about ML not doing it because we don't give
       | it the right data. You can't learn causation from a table of
       | floats. You _can_ learn causation from a sufficiently annotated
       | table of floats.
       | 
       | If I tell you that here's a column of people's weights and a
       | column of their diets, no model can learn the causal connection
       | between the two. If I tell you that here's a column of peoples
       | weights and here's a column of their diets _which were randomly
       | assigned under intervention_ , then suddenly a model can do it.
       | 
       | All causal inference with observational data requires assumptions
       | about conditional independence structure. It's so crucial that we
       | always explain it all in prose in any writeup of any given causal
       | investigation. We put none of that in tables themselves, despite
       | it being entirely crucial. If we started making postgres tables
       | that stopped looking like "height: float, weight: float" and
       | started looking like "height: float, weight: do(float)" (as in
       | pearl's do(...)) then we could start to automate causal inference
       | much much more easily.
       | 
       | Not to say the types would be nearly so simple. You'd need a full
       | DAG for your database, and even then, it's not that simple: our
       | AB testing platform (v3.1) intervened here according to this 1k
       | line python script (git commit 191284794) that took in these
       | columns and employed a model trained on this other entire table
       | as of date X, before we migrated the db. Also this one column's
       | meaning changed in november when we removed a button from the
       | home page.
       | 
       | But without some structured encoding of the structure that an
       | analyst is going to need (structure they're absolutely going to
       | be encoding in natural language in their writeup), we're trying
       | to do it with one arm tied behind our backs.
        
       | PeterisP wrote:
       | The article makes a big unfounded assertion in one of the first
       | few sentences "We learn them at a very early age, without being
       | explicitly instructed by anyone and just by observing the world."
       | - we do _not_ learn these inferences  "just by observing the
       | world", we learn these inferences by _acting_ on the world and
       | observing the results, and arguably they are simply not learnable
       | by pure observation alone. There 's some theoretical grounding
       | for that (IIRC Judea Pearl's book Causality might be a good
       | source for that) and some experimental grounding for that (e.g.
       | the two kitten experiment for which I can't quickly find the
       | actual paper but it's described at https://io9.gizmodo.com/the-
       | seriously-creepy-two-kitten-expe... ), also other fields like
       | medicine have a good understanding of limitations of what
       | knowledge can be gained from purely observational studies versus
       | experimental interventions. So it seems a bit misleading to write
       | a whole article about "why machine learning struggles with
       | causality" without even addressing the key difference of learning
       | from observations vs learning from interaction, which IMHO is a
       | much more fundamental obstacle than everything the article
       | menstions.
        
         | Krabbyos wrote:
         | citation for the two kitten experiment:
         | 
         | Held, R., & Hein, A. (1963). Movement-produced stimulation in
         | the development of visually guided behavior. Journal of
         | Comparative and Physiological Psychology, 56(5), 872-876.
         | https://doi.org/10.1037/h0040546
        
         | skybrian wrote:
         | Both the article and the paper mention interventions and refer
         | to Pearl's work, so I think we can assume this is just a matter
         | of poor phrasing and a child's observations of its own
         | interventions aren't meant to be excluded.
         | 
         | But for the purposes of machine learning, it seems like it
         | should be possible to learn from observations of someone else's
         | interventions?
        
         | musingsole wrote:
         | > we do not learn these inferences "just by observing the
         | world"
         | 
         | We most certainly can, just not as well or as strongly as when
         | we're able to influence the system under observation. You're
         | speaking way too strongly and simplifying a complex mechanism
         | down past anyone's expertise.
        
           | Jeff_Brown wrote:
           | _After_ interacting with the world, we can learn from passive
           | observation. No human has ever learned anything about
           | causation without first spending a while experimenting in the
           | world.
        
           | dumbfoundded wrote:
           | An interesting related idea is inverse reinforcement
           | learning. We watch how other people interact with a system to
           | estimate a reward function and then we later test it out
           | ourselves. Either directly with an environment or inside our
           | own mental model of the environment. Simply "observing the
           | world" can give us data about how to learn in an environment
           | we've never interacted with before; only by watching others
           | interact with it.
        
             | still_grokking wrote:
             | That's true. But the key seems to remain the "interaction
             | with the world".
        
               | dumbfoundded wrote:
               | Maybe it's semantics but interacting with a simulation of
               | the world is actually more important. In a pure sense,
               | this doesn't require any actual real world interaction.
               | This concept is usually referred to offline and off-
               | policy reinforcement learning.
               | 
               | If you're saying that interaction in any sense is
               | important, I'd very much agree that unsupervised learning
               | and supervised learning aren't equipped to handle
               | reinforcement learning problems. Correct framing of a
               | problem is necessary to achieve a desired property like
               | causality.
        
           | jcims wrote:
           | I think we have a hard time disconnecting our personal
           | experience with the observations we make. When we look at a
           | photograph of a person riding a bicycle down a path, even if
           | we've never ridden a bicycle we've likely been outdoors,
           | stood on a path, felt wind when we moved, etc. We may not be
           | able to accurately simulate the experience in our mind but we
           | can get close.
           | 
           | On the other hand, the starting point for an ML system
           | interpreting that same image is essentially a stream of
           | scalar values that tend to demonstrate multiple layers of
           | periodicity (3-4 byte intervals for RGBA and then another
           | layer per line of rasterized image data and yet another per
           | frame if it's a video).
           | 
           | Here's a quick experiment. Let the video linked below play
           | for a five count (sound is essential but a bit intense so
           | maybe moderate the volume first) so you have some confidence
           | it's not just me playing a rude trick, then close your eyes
           | for a five count. There's going to be a major change in the
           | sound when you get near 'five', now try to imagine how the
           | scene changed before opening your eyes again:
           | 
           | https://youtu.be/qnL40CbuodU?t=25
           | 
           | I think without any reference from an embodied perspective,
           | we're asking ML systems to understand the sounds (which are
           | also streams of scalar values that demonstrate periodicity)
           | the same way we interpret the representation visually.
           | 
           | (Also if you enjoyed the example above check out these two
           | channels, some of them are mindblowing)
           | 
           | https://www.youtube.com/user/jerobeamfenderson1
           | 
           | https://www.youtube.com/c/ChrisAllenMusic
           | 
           | And a fun video explaining it all -
           | https://www.youtube.com/watch?v=4gibcRfp4zA
        
           | [deleted]
        
           | fractionalhare wrote:
           | I don't think they're speaking too strongly. I think a lot of
           | the time when we correctly infer causality without
           | empirically interacting with the system, it's because we have
           | built up significant categorical experience about more atomic
           | systems we were able to interact with.
           | 
           | In my view, a lot of things that are noninteractively
           | inferred are compositions of more fundamental things that
           | required empirical experience. When you've had the causality
           | of gravity thoroughly beaten into you at a young age, a lot
           | of other things seem intuitive that would otherwise
           | completely fall outside a framework for being unempirically
           | learned.
           | 
           | Do you have a specific counterexample of causality you can
           | infer without interaction or empirical experience of
           | something related?
           | 
           | Caveats: I'm not a neurologist or psychologist, so this is
           | mostly philosophical speculation on my part.
        
             | visarga wrote:
             | We think we're so smart because we causally understand the
             | world, but it took us a very long time to collectively
             | discover these principles. A human alone would not be so
             | smart.
        
               | still_grokking wrote:
               | I think this confuses (fact) knowledge and the ability to
               | recognize causal relations.
               | 
               | A human can never be smarter than said human: Our brains
               | are not connected and we can't share capacity with
               | others.
               | 
               | So, discovering causality is always an individual
               | experience. And that happens likely by "playing" with
               | "the world".
               | 
               | I think it's noticeable that smarter animals are more
               | playful. Which is also a hint that points to the
               | fundamental importance of interaction with the world as a
               | prerequisite for "smartness". Additionally the
               | capabilities of the "sensors" and "actors" that make
               | interaction with the world possible in the first place
               | seem to be crucial to develop "smart behavior".
               | 
               | The part about the "sensors" seems quite obvious. I think
               | one can gain a better general understanding of some thing
               | if one can "experience" it in more than one "dimension".
               | 
               | And the "actors" allow one to perform "experiments" with
               | the things around one, and find out this way how that
               | thing "works" or is supposed to be "used".
               | 
               | That's actually the behavior that can be observed in
               | children of all kinds of "smarter" species. So it seems
               | to be at least linked somehow to "smartness".
        
             | musingsole wrote:
             | Empirical evidence is a special case observation. If you
             | observe the whole universe in its entirety, you could
             | separate moments that effectively followed whatever
             | conditions you might set in a lab. You can't act on the
             | system, but you can forever tune your models to match the
             | infinite observations you could record. At that point the
             | separation between observation derived knowledge versus
             | experiential knowledge is meaningless (it's just hard to
             | imagine a universal model manifesting without having used
             | experiental knowledge along the way).
             | 
             | Whether you swing the bat or just watch it hit the ball
             | into the sky, you have the prerequisites needed to reason
             | about the interaction.
             | 
             | A more entertaining question is how a system comes to
             | _believe_ causality (i.e. comes to believe that things _can
             | and must_ have causes)
        
               | still_grokking wrote:
               | > (it's just hard to imagine a universal model
               | manifesting without having used experiental knowledge
               | along the way)
               | 
               | That's actually the point!
        
           | shkkmo wrote:
           | > We most certainly can
           | 
           | That is a very, very strong statement that requires some
           | proof to go with such a strong statement of certainty.
           | 
           | While we can learn that the ball's change of movement just
           | fine without swinging the bat, we can only do that because we
           | are generalizing from a large body of knowledge that we
           | devoloped by experimenting on the world using our body.
           | 
           | I am not aware of a single piece of evidence that an agent
           | can use purely observational learning to ever aquire the
           | causal knowledge of the real world to sufficient level to
           | make those sorts of inferences with any sort of reasonable
           | accuracy.
        
             | musingsole wrote:
             | I'd link studies about children aping things they have only
             | seen through a window, but I suspect you'll quickly argue
             | out of the bounds of those studies and so I won't.
        
           | galaxyLogic wrote:
           | Right. "World" includes us. When we perform actions we
           | observe ourselves doing them and then we observe the
           | consequences.
           | 
           | That is the only way we can learn anything, by observing. And
           | what else is there to observe than the "world"? We can
           | observe our own thinking process but that is part of the
           | world too. I would say. It is definitely not "out of this
           | world" :-)
        
             | visarga wrote:
             | yes, we learn causal reasoning by observing consequences,
             | we can't learn by simply watching without acting
        
             | [deleted]
        
         | jfengel wrote:
         | I suspect that we have some of it built in, even before we
         | observe or act. Something in us starts us on the road of the
         | hypothesis that time is a thing we can reason about.
         | 
         | The illusion of causality is incredibly strong -- so much so
         | that it's really hard to get people to give it up, even when
         | faced with paradoxes (First Mover, quantum mechanics, multiple
         | causation, etc.)
         | 
         | We don't come into the world with a fully formed sense of
         | causality, and it appears over the course of months. But at
         | least some of it may be wired into the hardware, like the
         | language instinct. It's just that the wiring isn't done the day
         | we leave the womb, and is influenced by what comes after.
        
           | t0mas88 wrote:
           | Indeed, but it comes really early in development. A 2 months
           | old baby understands that if you smile at other people they
           | will interact with you. And they'll try to use the to get
           | more interaction / playing (in a very simple way) which also
           | makes sense evolutionary because more attention from adults
           | means higher chance of survival.
        
         | JulianMorrison wrote:
         | TBH the idea that humans learn _anything_ on a blank slate pure
         | learning architecture is messed up. Our brains are literally
         | evolved to interpret causality into the world. It isn 't
         | randomly just "learned" any more than walking is.
        
           | andyxor wrote:
           | exactly, besides ignoring the innate structures, heuristics
           | and biases hardcoded via evolution, the whole notion of
           | "learning" became highly intertwined with reinforcement kind
           | of learning, i.e trial & error, stimulus and response
           | behaviorist terms popularized by Pavlov and Skinner a century
           | ago, which is just one type in a large repertoire of
           | adaptation mechanisms.
           | 
           | Memory in these models is used as afterfact, or some side
           | utility for complex iterative routines based on calculus of
           | function optimization. While in living organisms memory and
           | its "hardcoded" shortcuts allow to cut through the search
           | space quickly as in a large database index.
           | 
           | Speaking in database terms we have something like
           | "materialized views" on acquired and genetically inherited
           | knowledge, built from compressed and hierarchically organized
           | sensory data and prior related actions and associations,
           | including causal links. Causality is just a way to associate
           | items in the memory graph.
           | 
           | Error correction doesn't play as much role in storing and
           | retrieving information and pattern recognition, as current
           | machine learning models may lead you to believe.
           | 
           | Instead, something akin to self-organized clustering is going
           | on, with new info embedded in the existing "concept" graph
           | via associations and generalizations, through simple LINK and
           | JOIN mechanisms on massive scale.[1] The formation of this
           | graph in long term memory is tightly coupled with sleep
           | cycles and memory consolidation, while short term memory
           | serves as a kind of cache.
           | 
           | Knowledge is organized hierarchically starting from principal
           | components [2] of sensory data from e.g. visual receptive
           | fields, and increasing in level of abstraction via
           | "chunking", connecting objects A and B to form a new object C
           | via JOIN mechanism, or associating objects A and B via LINK
           | mechanism. Both LINK and JOIN outputs are "persisted" to
           | memory via Hebbian plasticity.
           | 
           | All knowledge including causal links are expressed via this
           | simple mechanism. Generating a prediction given a new sensory
           | signal is just LINKing the signal with existing cluster by
           | similarity.
           | 
           | Navigation in this abstract space is facilitated via
           | coordinate system similar or perhaps identical to the role
           | hippocampal place & grid cells play in spatial navigation.
           | Similarity between objects is determined as similarity
           | between their "embeddings" in this abstract concept space.
           | 
           | It's possible that innate structures are genetically pre-
           | wired in this graph which represent high level "schemas",
           | such as innate language grammar which distinguishes e.g. verb
           | from noun, visual object grammar which distinguishes "up"
           | from "down", etc. It is also possible these are embodied,
           | i.e. connected to some representation of motor and sensory
           | embeddings. And serve to bootstrap the graph structure for
           | subsequent knowledge acquisition. I.e. no blank slate.
           | 
           | The information is passed, stored and retrieved via several
           | (analogue) means both in point-to-point and broadcast
           | communication, with electromagnetic oscillations playing
           | primary role in synchronization in neural assemblies,
           | facilitating e.g. speech segmentation (or boundary detection
           | in general), and coupling an input signal "embedding" to
           | existing knowledge embeddings in short term memory; while
           | neural plasticity/LTP/STDP as storage mechanisms on single
           | neuron level.
           | 
           | [1] See Leslie Valiant "neuroidal" model and his book
           | https://www.amazon.com/Circuits-Mind-Leslie-G-
           | Valiant/dp/019...
           | 
           | [2] See Oja Rule
           | http://www.scholarpedia.org/article/Oja_learning_rule
           | 
           | and Olshausen & Field classic work on sparse coding
           | http://www.scholarpedia.org/article/Sparse_coding
        
         | danans wrote:
         | > we learn these inferences by acting on the world and
         | observing the results, and arguably they are simply not
         | learnable by pure observation alone.
         | 
         | Tangentially, wouldn't sampling based ML methods like particle
         | filters / kalman filters or other randomized state space
         | exploration algorithms be analogous to the person learning by
         | acting on the world? In this case, the "action" would be
         | bouncing the radar off the object being tracked.
         | 
         | Of course these models are far more limited than a child in the
         | way they can act on the world, and in the number of aspects of
         | reality they can model.
         | 
         | And furthermore, they have no concept of causality, and
         | represent only the current state of knowledge they are
         | modeling.
        
         | mcguire wrote:
         | " _...we do not learn these inferences "just by observing the
         | world", we learn these inferences by acting on the world and
         | observing the results..._"
         | 
         | The article: " _"Machine learning often disregards information
         | that animals use heavily: interventions in the world, domain
         | shifts, temporal structure -- by and large, we consider these
         | factors a nuisance and try to engineer them away," write the
         | authors of the causal representation learning paper. "In
         | accordance with this, the majority of current successes of
         | machine learning boil down to large scale pattern recognition
         | on suitably collected independent and identically distributed
         | (i.i.d.) data."_ "
         | 
         | The key words are "interventions in the world." The article
         | goes on to say, " _"Generalizing well outside the i.i.d.
         | setting requires learning not mere statistical associations
         | between variables, but an underlying causal model," the AI
         | researchers write._ " The point being that, whether or not
         | acting in the world is an essential condition for learning
         | causality, current machine learning approaches are not even
         | trying for causality.
        
         | darksaints wrote:
         | This is something I discussed with a friend recently. I talked
         | about how the key to unlocked more potential for AI was to stop
         | sandboxing it away from the world and let the AI start
         | interacting with it. And the immediate pushback to that is that
         | that would cause immediate chaos. Can you imagine AI-driven
         | cars that out of nowhere decide to brake-check just to see what
         | happens?
         | 
         | Kids are often an unpleasant annoyance in restaurants, and many
         | people that don't like that annoyance try to convince
         | restaurants and lawmakers to ban them from restaurants. The
         | problem with those ideas is that by banning kids from
         | restaurants, you are just going to create annoying adults in
         | restaurants over time. Kids are annoying in restaurants, but
         | they are also learning how to interact with the world. If you
         | don't find a way to let them explore boundaries, they never
         | learn, and they'll become obnoxious restaurant patrons even as
         | fully grown adults.
         | 
         | Which kind of goes back to ethical AI. You can't unleash
         | unbounded AI on the world, or else you'll cause chaos. And you
         | can't sandbox AI, or it will never truly learn. What are you
         | supposed to do then? I don't know, but the answer isn't firing
         | the ethical AI department because you don't want them
         | criticizing your ad empire ;)
        
           | Radim wrote:
           | Makes you wonder how life operated near the beginning, before
           | "how to behave sustainably" evolved. Before individual death,
           | before sex, before speciation, before children, maybe even
           | before genes.
           | 
           | The world must have seen some wild, explosive action in its
           | day.
           | 
           | Immortal organisms still exist (two-headed planaria!), but on
           | the whole the ecosystem seems pretty well calibrated by now.
        
             | bserge wrote:
             | Death upon death upon death, it seems.
        
           | klyrs wrote:
           | Only, AI doesn't even have object permanence, something that
           | babies pick up in a few months, years before they can make
           | causal inferences longer than a few seconds. We don't let
           | babies drive cars, we give them toys that they can't hurt
           | themselves with. We only allow that when they're smart enough
           | to learn from verbal/written instruction, have the fine motor
           | skills to operate a vehicle, and the situational awareness to
           | operate a car safely. People making self driving cars are
           | basically putting infants in the drivers seat.
        
           | bserge wrote:
           | There are a lot of simulators out there. It would not be out
           | of the realm of possible to set up a learning AI to play them
           | over and over again and record the results.
           | 
           | I think besides the sheer amount of work put in to program
           | something like that, the main limitation would be processing
           | power, that sort of thing would take an immense amount of it.
           | 
           | Now that I think about it, isn't this exactly what Tesla and
           | other self-driving companies is trying to do?
           | 
           | So, it's not like it's sandboxed, it's just very hard to make
           | this "kid" play with things.
        
           | cpleppert wrote:
           | Except AI isn't being 'sandboxed away' in any discernible
           | way. A child being an unpleasant annoyance isn't comparable
           | to a an AI that can't drive in either risks of failure or
           | capability because the child is far more capable of
           | moderating his behavior to conform to social norms than an
           | AI.
        
           | beaconstudios wrote:
           | The broader theoretical basis behind this idea is called
           | embodied cognition.
        
         | 6gvONxR4sf7o wrote:
         | > we learn these inferences by acting on the world and
         | observing the results, and arguably they are simply not
         | learnable by pure observation alone.
         | 
         | Also crucially, we learn these inferences by acting on the
         | world and knowing something about why we acted. "I was playing
         | with it" is a conditional independence statement that we use
         | all the time while learning how things work, we just usually
         | don't use the mathy language to describe it. We're running
         | randomized controlled trials constantly, but implicitly.
         | 
         | Coincidentally, it's a common anecdote that people who seem to
         | learn things quickly and deeply have this curiosity and will
         | play with a thing/twiddle the knobs while they're learning how
         | it works. When you're playing a video game and someone says
         | "hang on, let me figure out the controls for a sec" they're
         | changing the conditional independence structure of their
         | observations and running an RCT.
        
         | [deleted]
        
         | _0ffh wrote:
         | Another thing which is often forgotten, we have evolutionarily
         | adapted biases built into our learning machinery. That helps us
         | to learn some things that tended to be essential for survival
         | (much) more quickly, but it can also hinder us in learning some
         | other things that we are not adapted for.
        
         | jonplackett wrote:
         | I saw a study a while back that kids only a few months old will
         | preferentially go to a person who helped their mum open a jar
         | rather than one who refused.
         | 
         | There's a hell of a lot of observation happening in kids minds
         | _very_ early on.
         | 
         | No doubt it's easier once you can pick up a baseball bat
         | yourself, but I have no doubt a young kid would understand
         | basic objects connected together without ever having used them.
        
         | MeteorMarc wrote:
         | Of course it makes a difference whether you observe someone
         | acting or you are the actor yourself, but why could not you
         | learn from someone else acting and observing the results?
        
           | karpierz wrote:
           | You can think of "acting on the world" as resolving questions
           | about the world, like "what happens when I do X"?
           | 
           | You could observe the results of others acting, but it means
           | that the questions you're getting answers to are outside of
           | your control. So if you need to know the answer to a
           | particular question, you either need to test it yourself, or
           | hope that whoever you're watching will test it for you.
        
           | t0mas88 wrote:
           | It's possible to learn from someone else acting and observing
           | the results only if you understand what the observed person
           | is doing and could map it to yourself. Which babies for
           | example cannot do. They like you making crazy faces but when
           | very young they don't learn how to do the same thing from
           | that. I think for two reasons, first they don't know what you
           | did to make that face (which in the "experimenting yourself"
           | case they would have known even if it was a random action)
           | and second they can't map what they see to actions with their
           | own body. Both need to be learned first before you can learn
           | from observation. While learning from your own (random)
           | interactions with the world is possible from day 1.
        
         | jasonwatkinspdx wrote:
         | Yeah, Judea Pearl's book is the definitive work. His formalism
         | also has an aspect that you could say is similar to imagining
         | an intervention in a counterfactual way, and computing its
         | likelihood given the data. His paper "Why I'm only half
         | Bayesian" lays out how he sees the epistemology of this.
        
         | mlthoughts2018 wrote:
         | This has been discussed widely in computer vision for many
         | decades
         | 
         | - https://www.routledge.com/The-Ecological-Approach-to-
         | Visual-...
         | 
         | - http://fmdb.cs.ucla.edu/Treports/soatto_extended_v18.pdf
        
         | Der_Einzige wrote:
         | I disagree here. We DO "learn them at a very early age, without
         | being explicitly instructed by anyone and just by observing the
         | world" the majority of information that a human ingests doesn't
         | include direct actions upon the world (in any more than a
         | philosophical sense anyway). We may feel as though actions are
         | more memorable, but I claim that we observe than take actions,
         | not the other way around. We learn things all the time that are
         | not "consciously learned". Addition (e.g. 2 apples are more
         | than one apple) is learned by most humans in an unsupervised
         | manner far before labels are added later. I claim that early
         | childhood development and thus much of our "foundation" is
         | primarily rooted in having no explicit information AND not
         | enough power to directly label data ourselves (through
         | experiments, play, etc)
         | 
         | This is important because "learning without explicit
         | instructions" in ML speak is _Unsupervised learning_
         | (clustering and dimensionality reduction). There are no labels
         | except the ones that you decide upon yourself (cluster
         | membership). Unsupervised Learning is still far in its infancy
         | in effectiveness compared to supervised systems, and it 's no
         | surprise that its algorithms are generally extremely easy to
         | implement from scratch (e.g. K-means or DBSCAN) compared to
         | relatively difficult work like automatic-differentiation in
         | neural networks.
         | 
         | Learning by reading information in a book or by direct didactic
         | teaching would be supervised learning. Learning through a
         | dialectical format would be reinforcement learning . Self-
         | supervised learning would be equivalent to autodidactic
         | learning and the creative act upon the world. (maybe the
         | distinction between self-supervised and reinforcement is
         | arbitrary)
         | 
         | The point is that we want to learn as much as we can given the
         | information available to us. We should not rule out the role
         | that the biological analogue to unsupervised learning plays in
         | human development.
         | 
         | It is for all of these reasons that I become far more excited
         | when a new clustering or dimensionality reduction algorithm
         | comes out than I am when a new neural network architecture
         | becomes state of the art.
        
           | shkkmo wrote:
           | Children do not learn completely "unsupervised" and recieve
           | frequent feedback from the agents. I would argue that a
           | significant amounts of childhood development, (especially
           | around "labels"), is due to our hardwired attachment to human
           | faces.
           | 
           | I have always felt that the signifance of the "social
           | software of human culture" in our general intelligence and
           | learning capacity was underestimated by the AGI community.
           | 
           | So personally, I see more potential in communities of
           | learning agents than any developments in the underpinnings.
        
         | meroes wrote:
         | Not saying you are wrong but I learned to ride a bike by
         | watching my classmates be taught in preschool.
         | 
         | Maybe direct acting is a quicker learning method. Or maybe
         | seeing others' learning processes and instructions allows one
         | to leapfrog ahead by not duplicating mistakes.
         | 
         | Maybe always learn off a good dataset if it exists first?
        
           | didibus wrote:
           | > I learned to ride a bike by watching my classmates be
           | taught in preschool
           | 
           | I'm confused, you're saying you watched others learn and then
           | managed to get on a bike for the first time and properly ride
           | it like an experienced rider with no practice whatsoever?
        
           | beaconstudios wrote:
           | First, you make a mental model. Then, you test it.
           | 
           | If you don't test it then you don't know if it works, and you
           | can't improve on it.
           | 
           | Knowledge is derived from experience.
        
             | musingsole wrote:
             | Scoring a model's predictions are a means of testing the
             | model without having it coupled to the system under
             | observation.
             | 
             | Observation is a type of experience.
        
               | Jtsummers wrote:
               | And you can gain experience/understanding without needing
               | to either observe or directly experience something,
               | merely by thinking about it or being told about it. Not
               | every kid has to be or see a burn caused by touching a
               | hot stove to learn it's a bad idea to try it.
               | 
               | If you require actual experience or direct observation to
               | learn, then you're not using your brain to its full
               | potential.
        
               | shkkmo wrote:
               | You can certainly generalized previously gained knowledge
               | without direct experience in that particular instance.
               | 
               | Would a child who has never experienced the human body's
               | pain response be able to infer the causal connection
               | between the heat of a stove and the response after
               | touching it?
               | 
               | Arguably, language is a tool that allows us to generalize
               | the direct experience of other agents. It is unclear if
               | it is possible to remove direct interaction from a
               | learning system and still reach the same level of
               | understanding.
        
               | beaconstudios wrote:
               | I'm not sure what you mean by "without having it coupled
               | to the system under observation" - could you clarify?
               | 
               | I do agree that observation is a type of experience, but
               | a model that is meant to guide action (basically any
               | useful model) needs to be tested in action. I can't learn
               | to juggle only by watching other people juggle. I can
               | only develop a hypothesis about how one juggles, but to
               | test (and refine) it is to try the hypothesis out.
        
               | musingsole wrote:
               | A model being coupled to a system == a model that can
               | influence the system's state through _a means_
               | 
               | > a model that is meant to guide action (basically any
               | useful model) needs to be tested in action
               | 
               | No, it doesn't. For example, the vast majority of work on
               | modeling the stock market is done on machines completely
               | sandboxed from any ability to make trades and are owned
               | by companies who will never make a trade themselves but
               | instead return an API response with a yes/no. Whether
               | that is fed directly into some sort of automated action
               | is largely irrelevant as the ability for an individual
               | trade to cause a measurable impact on the market is
               | negligible until it isn't. So, these systems are built
               | separate from the system they model and learn entirely
               | through observation.
               | 
               | tl;dr: weather forecasting models don't have an action to
               | take and also can't influence their system. And yet they
               | learn and grow more accurate.
        
               | beaconstudios wrote:
               | OK that's a fair criticism. Then perhaps we can divide
               | models into those that influence the system they observe
               | (regulatory systems) versus those that only measure, or
               | whose influence is negligible. Models that aim to
               | influence a system do indeed need to be used to test
               | their efficacy.
        
               | shkkmo wrote:
               | It's not just about testing their efficacy it's about the
               | theoretical limits of pure observation when doing causal
               | reasoning. We know that we are better served by avoiding
               | causal certainty when using purely observational studies.
               | It seems like the base assumption should be that similar
               | epistemic constraints apply to machine learning.
        
               | beaconstudios wrote:
               | Yes the best way to understand a system is to interact
               | with it. But there are scenarios where that simply isn't
               | possible and yet we can still model causality, like the
               | weather example musingsole gave.
        
             | cpleppert wrote:
             | > First, you make a mental model. Then, you test it.
             | 
             | Except that isn't how that really works right? A mental
             | model explains part of the reasoning that led to an outcome
             | but never all. After all, the map is not the territory. A
             | mental model is grounded in trivial assumptions.
             | Ultimately, your brain produces inferences in a way that is
             | incredibly hard to couple to a specific logical processes.
             | There has been a lot of research on how experts think and
             | reason and none of it is compatible with having a mental
             | model whatsoever.
        
               | beaconstudios wrote:
               | A mental model is fundamentally a predictive tool. It
               | doesn't describe reality, it just allows us to make
               | educated guesses about what will happen.
               | 
               | I'm a constructivist so I'm under no assumption that we
               | have access to the territory at all except modulated by
               | our subjective perception.
        
         | BenoitEssiambre wrote:
         | I've read Pearl's book too and it wasn't clear if it was
         | possible to systematically "observe interventions" instead of
         | doing them yourself and what the rules would be for that.
         | 
         | I mean there was some stuff about instrumental variables but it
         | seems the theory is a bit incomplete in that area.
         | 
         | What distinguishes an intervention vs just normal observable
         | randomness? Does it have to do with the complexity of the
         | entity performing the "intervention", with the fact that this
         | entity can observe and act on knowledge? I guess it's kind of
         | the debate about where determinism ends and free will begins.
         | Are there mathematical bounds to help us sort it out though?
         | Maybe there is something information theoretic? It's very
         | unclear in my mind.
         | 
         | This is all about fitting generative models that are robust to
         | counterfactual changes, that remain predictive even if you run
         | your models with data you've never observed, this beyond simple
         | interpolation/extrapolation. Are there priors in model
         | structure that tend to naturally make models much more robust
         | to counterfactual changes and that make models work well beyond
         | the data? Do these priors get more effective when they include
         | some latent variables that distinguish between observations and
         | interventions? How to you train these latent variables?
        
         | GistNoesis wrote:
         | Formally known as "do-calculus" which helps solve the
         | correlation is not causation issue which usually affect
         | statistical methods (like ML).
        
       | galaxyLogic wrote:
       | Isn't it simple. There is no single root-cause ever. What causes
       | something is elementary particles moving in certain ways together
       | always affecting each other. There is never a single "root
       | cause".
       | 
       | A separate question is "Who is guilty?". "Who deserves credit?"
        
         | pas wrote:
         | Highly recommended related essay/paper:
         | https://www.gwern.net/Everything (Very terse and information
         | dense though.)
        
       | nafizh wrote:
       | Arguably one reason causality research in the machine learning
       | community hasn't boomed is there is no framework/ease of access
       | to quickly code up the current heuristics/patterns/graph models
       | like you can do with a deep learning idea using pytorch. Deep
       | learning has reached today's stage because of early frameworks
       | like Theano and Caffe. Access to beginners in a field is crucial
       | for SOTA development which although feels a little counter-
       | intuitive is nevertheless true. If you search for causality you
       | get a bunch of books and papers from Pearl and Scholkopf which
       | are fun for reading but what do I do something actionable with
       | that quickly.
        
         | Der_Einzige wrote:
         | This is the correct answer. The masses will not tolerate buggy
         | research code.
        
         | joe_the_user wrote:
         | _Arguably one reason causality research in the machine learning
         | community hasn 't boomed is there is no framework/ease of
         | access to quickly code up the current heuristics/patterns/graph
         | models like you can do with a deep learning idea using
         | pytorch._
         | 
         | Ironically enough, it seems like you're confusing cause and
         | effect here. The reason that there's little causation based
         | reasoning isn't because there's no automation for it. Rather,
         | the reason there's no automation for it is because it hasn't
         | boomed.
         | 
         | The reason deep learning based ML for image recognition boomed
         | is because you could take a fairly database of images and
         | categorizations and produce an impressive and testable system
         | using straight forward if challenging optimization procedures.
         | Because this approach has boomed, huge amounts of money have
         | flowed to it, lot of people have been hire, and it's been semi-
         | automated and so you have a combination of data and frameworks
         | that let you things quickly. Some high percentage of all the
         | achievement of current ML is leveraging the original static
         | ability to sort images (or sort buckets of bits) into different
         | areas (Alpha go - sorts moves into "good" and "bad" and adds
         | tree pruning, etc). Which isn't to discount it, it's the first
         | sort of system can seem "as good as human" in certain areas.
         | 
         | But when there's no similarly easy and impressive procedure for
         | taking, say, a time series, and getting the next result better
         | than human or traditional statistics can predict, there's no
         | boom, no gathering of public data sets, no easy automation of
         | the standard procedures and so-forth.
        
           | nafizh wrote:
           | Theano started around 2007, long before DL got popular or the
           | Imagenet competition where DL outperformed traditional
           | methods by a long margin.
        
             | p1esk wrote:
             | The 2012 Imagenet results which jumpstarted DL did not
             | benefit from Theano or Torch frameworks. Alex Krizhevsky
             | had developed his own GPU accelerated framework (cuda-
             | convnet), and it remained quite popular for a couple of
             | years after the competition, until Theano and Torch caught
             | up with it.
        
         | ZephyrBlu wrote:
         | Funnily enough I tried to look into Causal Analysis because I
         | thought it might be applicable to something I'm working on.
         | 
         | What I found was exactly like you said, a bunch of theory which
         | was kind of interesting but didn't seem very practical at all.
         | 
         | Lots of DAG manipulation without actually explaining how to
         | gather data and model a DAG yourself.
        
       | ironmantissa wrote:
       | Judea Pearl also wrote a great layman book called "The Book of
       | Why" that I highly recommend.
        
       | zipotm wrote:
       | Because programmers or project engineers are idiots. That's it.
        
       | darksaints wrote:
       | Humans struggle with causality. Even those whose professions are
       | dedicated to understanding causality struggle with it. That's the
       | reason we have heuristics like the "5 whys" that only
       | occasionally work. Consider the following philosophical problem:
       | 
       | An inattentive headphone-laden jaywalker wearing black crossed a
       | road at night and was killed by a drunk driver speeding in a
       | large SUV. What was the root cause?
       | 
       | The amusing thing about this question is that if you actually
       | have an answer, it reveals more about your biases than it does
       | about the situation and potential solution(s). Various people
       | will chime in about the latest thing that annoys them...whether
       | it is inattentive pedestrians, jaywalkers, pedestrians wearing
       | black at night, drunk drivers, speeders, or people driving cars
       | that are too big and dangerous. But all of them are wrong because
       | there is no discernible root cause.
       | 
       | The problem is that we don't have multiple realities in which we
       | can control each factor involved. And even if we did, it is
       | entirely possible that if you can isolate and control each factor
       | individually, that the accident still wouldn't have happened.
       | Sometimes it takes a confluence of factors for an event to
       | actually occur. And people's obsession with finding a single
       | cause for complex phenomena hinders their ability to actually
       | find fixable solutions.
        
         | User23 wrote:
         | > The amusing thing about this question is that if you actually
         | have an answer, it reveals more about your biases than it does
         | about the situation and potential solution(s). Various people
         | will chime in about the latest thing that annoys them...whether
         | it is inattentive pedestrians, jaywalkers, pedestrians wearing
         | black at night, drunk drivers, speeders, or people driving cars
         | that are too big and dangerous. But all of them are wrong
         | because there is no discernible root cause.
         | 
         | Causality is usually complex and often complicated. Let's
         | consider another case:
         | 
         | An attentive person without any sensory impairment crossed at a
         | crosswalk in broad daylight and was killed by a sober bicyclist
         | going ten miles an hour[1]. What was the root cause?
         | 
         | All of the same objections apply. If you want an ultimate root
         | cause you need to turn to theology, in which case as a
         | Christian I can say that all death is ultimately caused by sin.
         | While I personally find that to be a philosophically sound
         | position, it isn't especially useful for answering the
         | pragmatic question of how to reduce preventable evils like
         | (some?) pedestrian deaths. My personal bias in this case is to
         | take a pragmatic approach. That means identifying factors that
         | can be changed with a high rate of compliance at a cost that is
         | less than that of the evil being prevented.
         | 
         | Mature safety systems take this into account. As an example,
         | take basic firearms safety[2]: 1) Treat all guns as if they are
         | loaded. 2) Never point a gun at anything you don't want to
         | destroy. 3) Keep your finger off the trigger until ready to
         | shoot. 4) Be sure of your target and what's behind it. In order
         | to negligently discharge a firearm and cause harm, all four of
         | these rules must be disregarded. In this case, even though any
         | number of events from mechanical failure to muscle spasm could
         | have caused the discharge, if someone causes unintentional harm
         | I can say the local root cause was failing to observe the
         | safety rules and hold said person morally responsible[3]. Many
         | other examples of safety systems should come to mind, including
         | relevant to the pedestrian safety scenarios.
         | 
         | [1] One way that this could plausibly happen is that the
         | pedestrian is knocked over and hits his head. Falling and
         | hitting one's head is a surprisingly common way to die.
         | 
         | [2] There are variants, but they all aim to achieve safety in
         | much the same way.
         | 
         | [3] This holds regardless of one's position on civilian
         | ownership of firearms, because police and military personnel
         | are also human and need to follow a safety system.
        
           | viklove wrote:
           | > If you want an ultimate root cause you need to turn to
           | theology
           | 
           | No, you do not "need" to turn to theology. There are plenty
           | of explanations that do not involve invoking myths.
           | 
           | > all death is ultimately caused by sin
           | 
           | So the pedestrian was secretly a pedophile? That's a pretty
           | strange explanation...
           | 
           | I'd say in this case the blame likely rests on city planners,
           | for creating situations in which travelers can collide.
        
             | beaconstudios wrote:
             | The parent is clearly invoking the "uncaused cause" model
             | of God. Physicalism doesn't have an originating cause
             | except maybe the big bang depending on what you think could
             | have happened before it.
             | 
             | So your options in terms of the originating cause are
             | either to turn to religion for a philosophical answer, or
             | to say "I don't know and we may never know".
        
             | scrollbar wrote:
             | We can disagree and still be nice to each other.
        
         | NovemberWhiskey wrote:
         | The problem isn't the lack of multiple realities for
         | counterfactual testing - it's that the idea that there's a
         | basic root cause for any particular outcome is ill-founded.
         | 
         | The basic idea of 'look beyond immediate causes' is reasonable,
         | but the cult of the root cause analysis is a bit out of
         | control.
        
           | jerf wrote:
           | Perhaps you'd be happier with an idea like "when looking at a
           | bad outcome, invest some effort looking at the proximal
           | causes of the bad outcome to see if there's a place you can
           | invest less net effort to fix the problem and for greater
           | gain."
           | 
           | Obviously all root cause analysis terminates at "Because the
           | Big Bang and subsequent quantum fluctations had this result"
           | or something similarly utterly unactionable, but if you use
           | the metric above, it is a common observation that such
           | analysis can reveal higher bang-for-the-buck engineering
           | outcomes than simply fixing the immediately obvious, and that
           | there is some typical patterns that emerge such as the root
           | cause analysis eventually getting back into things that are
           | infeasibly expensive or impossible to fix (e.g. "because
           | human culture" will show up in a lot of them at some point
           | but you aren't going to fix that just because two holes were
           | misaligned on the factory line), meaning such analysis also
           | meaningfully terminates.
           | 
           | I tend to operate on this myself because you don't actually
           | get "a" root cause analysis. I can always find a tree of
           | causes as I go back, not "a" series of causes. But it's a
           | fairly frequent occurrence that if you look over such a tree,
           | there's at least one node with a highly favorable
           | cost/benefit tradeoff that you can find with surprisingly
           | minimal effort.
        
         | joe_the_user wrote:
         | Causality overall is hard. Humans fail at dealing fully with
         | casual perfectly fairly often. But human tend to do far better
         | than computers (partly failing rather than totally failing,
         | etc).
         | 
         | But your examples involve human's linguistic expression of
         | casualty, which is an entirely different question.
         | 
         |  _The amusing thing about this question is that if you actually
         | have an answer, it reveals more about your biases than it does
         | about the situation and potential solution(s). Various people
         | will chime in about the latest thing that annoys them...whether
         | it is inattentive pedestrians, jaywalkers, pedestrians wearing
         | black at night, drunk drivers, speeders, or people driving cars
         | that are too big and dangerous._
         | 
         | A lot of human language is "socially significant noise", which
         | some might object usually isn't true. But it often serves an
         | entirely different purpose than an accurate modeling of
         | reality.
         | 
         | People walk through complex society, putting their socks on
         | before their shoes and otherwise doing the basic things but
         | articulating positions that ... _other people_ would consider
         | "nuts" but this situation has nothing to do with a failure of
         | casual modeling, which happens on a different logical level
         | entirely.
        
         | jjtheblunt wrote:
         | what an excellently worded observation; hadn't thought clearly
         | of this before, though noticed the idea vaguely
        
         | skybrian wrote:
         | The article is about much simpler forms of causality though,
         | like what happens when you hit a ball with a bat. Learning
         | enough about causality to handle everyday physics would be an
         | important advance.
        
         | bserge wrote:
         | Funnily enough, this is what our "collective consciousness" is
         | useful for. Humans act a lot like an ant colony, except
         | individuals have much more autonomy.
         | 
         | So, you ask, say, 1000 people what the root cause is. The
         | result will be decided either by majority or the most plausible
         | argumentation. Say 700 people agree the jaywalker was at fault.
         | That will become "reality" for the group, which will then
         | likely spread throughout the hive.
         | 
         | The main lesson will be "don't jaywalk at night", the secondary
         | one likely "don't wear black clothing at night" and probably a
         | third one "beware of drunk drivers".
         | 
         | Sorry if that sounds weird, I'm also trying to wrap my head
         | around what human intelligence is and how it can be applied to
         | AI (Approximate Intelligence :)).
        
           | ZephyrBlu wrote:
           | Examples like this are how I realized that almost everything
           | is subjective.
           | 
           | Things we think are objective are usually subjective things
           | that we decided to agree on.
        
             | bserge wrote:
             | Yeah, I'd say that's true of most things that are too
             | complicated to be understood by a single individual.
             | 
             | In the end, it's all about the collective. It needs a
             | consensus to move forward, and that doesn't need to be
             | perfect, just good enough.
             | 
             | The whole human civilization works mostly on subjective
             | conclusions, sometimes a minority brings up enough
             | facts/proof to change the established consensus, but often
             | we just pile layers upon layers on top of things that are
             | not objectively true or not completely true. But they're
             | good enough, and we're very adaptable.
        
             | kempbellt wrote:
             | Ironically, I would argue that just about everything that
             | happens is an _objective_ event, but any and every
             | interpretation /understanding of causality is itself,
             | subjective.
             | 
             | In the hypothetical scenario the objective reality is: A
             | pedestrian and a car collided.
             | 
             | Other factors come into play, which all have potential
             | influence on the causality of this objective reality, and
             | said factors are frequently _subjectively_ asserted as more
             | relevant to causality than others.
             | 
             | Strong assertions to causality may include: Driver was
             | drunk, pedestrian was jaywalking, pedestrian was wearing
             | dark clothing, driver was speeding, driver was texting,
             | driver and pedestrian had a prior altercation, driver was
             | tired, etc
             | 
             | Weaker assertions to causality may include: It was a
             | Tuesday, someone sneezed 3 miles away, a butterfly flapped
             | its wings, pedestrian knocked over salt during dinner
             | earlier, etc
             | 
             | Causality is hard... We're getting better at it, but
             | superstitions _are_ a thing that exist, even if they are a
             | bit odd.
        
         | tremon wrote:
         | _What was the root cause?_
         | 
         | That's easy: the root cause is the roadworks two streets over,
         | which caused the driver to divert from his usual route.
        
       | ineedasername wrote:
       | This also covered very nicely by Daniel Dennet in his robot/bomb
       | paper. It's a great read (warning, pdf)
       | 
       | https://folk.idi.ntnu.no/gamback/teaching/TDT4138/dennett84....
        
         | ineedasername wrote:
         | ^^ If you really want to understand the issue from both the
         | historical AI research & philosophical perspective, this is the
         | article to read, and has the benefit of being an entertaining
         | read. Much of Dennet's work is similarly entertaining, and he's
         | absolutely brilliant.
         | 
         | (sorry for commenting on my own comment, I should have added
         | this detail in the first but was too late to edit.)
        
       | brindlejim wrote:
       | You could argue that reinforcement learning policies are already
       | causal models insofar as they relate state-action pairs to the
       | rewards and penalties that they lead to. The trial and error that
       | RL performs in a simulation is an exploration of counterfactuals
       | to establish cause.
       | 
       | But the policies lack introspection. One of the most powerful
       | things we could do is somehow extract causal models from those
       | policies, to see what they learned that lead them to behave more
       | intelligently. That would increase both our knowledge as well as
       | our trust in applying RL.
        
         | PartiallyTyped wrote:
         | There's a paper that used a random convolutional filter when
         | training the agent and they found that it manages to generalize
         | very well, they evaluated the CNN and found that the model put
         | emphasis on where the enemies were at each frame, which
         | indicates some form of understanding.
         | 
         | However, I don't think that there is any form of causal
         | relationship to be extracted for model-free agents. I don't
         | believe that what we are seeing is anything more than changing
         | action likelihoods in some very high dimensional function.
        
       | vsskanth wrote:
       | How well do differential neural nets perform in casual inference
       | ? There seems to be a pretty good library from sciml that claims
       | to learn models from limited data but wondering if they
       | generalize well
        
       | adolph wrote:
       | A Brief Overview of Causal Inference (covers Pearl and others)
       | 
       | https://tjohnson250.github.io/overview_causal_inference/over...
        
       | nerdponx wrote:
       | This was a surprisingly well-informed article. It covers the
       | i.i.d. assumption, in-sample vs out-of-sample data, and causal
       | modeling. And it avoids "AI" hyperbole.
        
       | Meniteos4 wrote:
       | >For instance, convolutional neural networks trained on millions
       | of images can fail when they see objects under new lighting
       | conditions or from slightly different angles or against new
       | backgrounds.
       | 
       | The big picture is humans use a multi-task network for depth,
       | segmentation (and background removal), lighting source estimation
       | (and shadow removal), material extraction, SLAM (and geometry
       | reconstruction), optical flow, etc. Papers and their networks
       | only look at a small part of what humans do; we are not just
       | using a single "neural network."
        
       ___________________________________________________________________
       (page generated 2021-04-02 23:01 UTC)