https://kevinlynagh.com/z3-simpsons-paradox/ Generating Simpson's Paradox with Z3. - Back to Kevin's homepagePublished: 2024 August 11 I've been reading Pearl's Causal Inference in Statistics, and one of the exercises poses this problem: Baseball batter A has a better batting average than his teammate B. However, someone notices that B has a better batting average than A against both right-handed and left-handed pitchers. How can this happen? The Z3 Theorem Prover is a great lil' tool for solving these sorts of problems. The following generates an example of this situation, which is more commonly known as Simpson's Paradox. ;; Player A's hits and misses against left-handed pitchers (declare-const A_L_hits Int) (declare-const A_L_misses Int) ;; and right-handed ones. (declare-const A_R_hits Int) (declare-const A_R_misses Int) ;; ditto for player B (declare-const B_L_hits Int) (declare-const B_L_misses Int) (declare-const B_R_hits Int) (declare-const B_R_misses Int) ;; All hits and miss counts must be positive (assert (< 0 A_L_hits)) (assert (< 0 A_L_misses)) (assert (< 0 A_R_hits)) (assert (< 0 A_R_misses)) (assert (< 0 B_L_hits)) (assert (< 0 B_L_misses)) (assert (< 0 B_R_hits)) (assert (< 0 B_R_misses)) ;; overall batting averages (define-const A Real (/ (+ A_L_hits A_R_hits) (+ A_L_hits A_R_hits A_L_misses A_R_misses))) (define-const B Real (/ (+ B_L_hits B_R_hits) (+ B_L_hits B_R_hits B_L_misses B_R_misses))) ;; batting averages against left and right-handed pitches (define-const A_L Real (/ A_L_hits (+ A_L_hits A_L_misses))) (define-const A_R Real (/ A_R_hits (+ A_R_hits A_R_misses))) (define-const B_L Real (/ B_L_hits (+ B_L_hits B_L_misses))) (define-const B_R Real (/ B_R_hits (+ B_R_hits B_R_misses))) ;; A has a higher overall batting average (assert (< B A)) ;; B has a better average against left-handed pitchers (assert (< A_L B_L)) ;; B has a better average against right-handed pitchers (assert (< A_R B_R)) (set-option :pp.decimal true) (check-sat) (get-model) (get-value (A A_L A_R A_L_hits A_L_misses A_R_hits A_R_misses)) (get-value (B B_L B_R B_L_hits B_L_misses B_R_hits B_R_misses)) Running Z3 against this file generates this output, showing an example satisyfing our constraints: ((A 0.2352941176?) (A_L 0.4) (A_R 0.1666666666?) (A_L_hits 2) (A_L_misses 3) (A_R_hits 2) (A_R_misses 10)) ((B 0.2307692307?) (B_L 0.5) (B_R 0.1818181818?) (B_L_hits 1) (B_L_misses 1) (B_R_hits 2) (B_R_misses 9)) The batting avergaes in tabular form: Player Left Right Overall A 0.4 0.167 0.235 B 0.5 0.182 0.230 The key to understanding the paradox is that the players did not bat against the same set of pitchers. A batted against 5 lefties and 12 righties; B against 2 and 11. If you liked this, you might like Kevin's email newsletter. As an Amazon Associate, we may earn commissions from qualifying purchases from Amazon.com. - Back to Kevin's homepage