jonas peters, mateo rojas-carulla, amar shah machine ...cbl.eng.cam.ac.uk › ... › readinggroup...

125
Causal Inference Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine Learning Group, University of Cambridge 23rd April 2015

Upload: others

Post on 25-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Causal Inference

Jonas Peters, Mateo Rojas-Carulla, Amar Shah

Machine Learning Group, University of Cambridge

23rd April 2015

Page 2: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

is based on work by ...

UCLA: Judea Pearl

CMU: Peter Spirtes, Clark Glymour, Richard Scheines

Harvard University: Donald Rubin, Jamie Robins

ETH Zurich: Peter Buhlmann, Nicolai Meinshausen

Max-Planck-Institute Tubingen: Dominik Janzing, Bernhard Scholkopf

University of Amsterdam: Joris Mooij

Patrik Hoyer

... many others

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 3: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 4: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Charig et al.: “Comparison of treatment of renal calculi by open surgery, (...) ”, British Medical Journal, 1986

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 5: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Simpson’s Paradox

Events C and E , F and ¬F describing sub populations.

P(E |C ) > P(E |¬C )

P(E |C ,F ) < P(E |¬C ,F )

P(E |C ,¬F ) < P(E |¬C ,¬F )

Key distinction between seeing and doing.

E : recovery

C : treatment

F : size of stones

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 6: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

underlying ground truth:

recovery size of stone

treatment

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 7: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Rain Sprinkler

Season

Wet

Slippery

S1: ”Turning the sprinkler on would not affect the rain”S2: ”The state of the sprinkler is independent of the state of the rain”Pearl.: “Causality: Models, Reasoning, and Inference ”

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 8: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

We often work with i.i.d. samples from a joint distribution P. Claim:

Many times, we are interested in a different distribution P 6= P.

speak . . . CAUSALITY!

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 9: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

We often work with i.i.d. samples from a joint distribution P. Claim:

Many times, we are interested in a different distribution P 6= P.

speak . . . CAUSALITY!

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 10: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

We often work with i.i.d. samples from a joint distribution P. Claim:

Many times, we are interested in a different distribution P 6= P.

speak . . . CAUSALITY!

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 11: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Pearl do calculus

Consider a Markovian model (acyclic directed graph with no cycles) andall noise terms are jointly independent.

Theorem

The Causal Markov Condition: Any distribution generated by a Markovianmodel M can be factorized as:

P(v1, . . . , vn) =∏

i

P(vi |pai )

where V1, . . . ,Vn are the endogenous variables in M, and pai are values ofthe parents of Vi in the causal diagram associated with M.

Pearl.: “Causality: Models, Reasoning, and Inference ”

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 12: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Corollary

For any Markovian model, the distribution generated by an interventiondo(X = x0) on a set X of endogenous variables is given by the truncatedfactorization:

P(v1, . . . , vk |do(xo)) =∏

i |Vi /∈X

P(vi |pai )|x=x0

where P(vi |pai ) are the pre-intervention conditional probabilities.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 13: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

X Y Z

Factorization: P(X ,Y ,Z ) = P(Z |Y )P(Y |X )P(X )

Truncate: P(Y ,Z |do(x0)) = P(Z |Y )P(Y |x0)

Marginalize: P(Y |do(x0)) =∑

z P(Y ,Z |do(x0)) = P(Y |x0)

Coincides with conditional probability!

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 14: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

X Y

Z

Factorization: P(X ,Y ,Z ) = P(Y |X ,Z )P(X |Z )P(Z )

Truncate: P(Y ,Z |do(x0)) = P(Z )P(Y |x0,Z )

Marginalize:P(Y |do(x0)) =

∑z P(Y ,Z |do(x0)) =

∑z P(Z )P(Y |x0,Z )

This does not (in general) equal P(Y |x0) =∑

z P(Y |x0,Z )P(Z |x0).

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 15: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

P(X1, . . . ,X3) has been generated by a structural equation model if

X1 = f1(X3,N1) X1 = f1(N1)

X2 = f2(X1,X3,N2)

X3 = f3(N3)

• Ni jointly independent

• G0 has no cycles

X2 X3

X1G0

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 16: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Given: graph and P.

X1 = f1(X3,N1) X1 = f1(N1)

X2 = f2(X1,X3,N2)

X3 = f3(N3)

• Ni jointly independent

• G0 has no cycles

X2 X3

X1

recovery size of stone

treatmentG0

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 17: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Given: graph and P. We (Pearl et al.) can then compute P.

X1 = f1(X3,N1) X1 = f1(N1)

X2 = f2(X1,X3,N2)

X3 = f3(N3)

• Ni jointly independent

• G0 has no cycles

X2 X3

X1

recovery size of stone

treatmentG0

IMPORTANT: modularity, autonomy: Aldrich 1989, Pearl 2009, Scholkopf et al. 2012, . . .

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 18: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Back to Simpson’s paradox: Remember the inequalities:

P(E |C ) > P(E |¬C )

P(E |C ,F ) < P(E |¬C ,F )

P(E |C ,¬F ) < P(E |¬C ,¬F )

What does do calculus tell us?

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 19: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

P(E |do(C ),F ) < P(E |do(¬C ),F )

P(E |do(C ),¬F ) < P(E |do(¬C ),¬F )

Also: P(F |do(C )) = P(F |do(¬C )) = P(F ). Thus:

P(E |do(C )) = P(E |do(C ),F )P(F ) + P(E |do(C ),¬F )P(¬F )

P(E |do(¬C )) = P(E |do(¬C ),F )P(F ) + P(E |do(¬C ),¬F )P(¬F )

This impliesP(E |do(C )) < P(E |do(¬C ))

.And the paradox is gone! Pearl.: “Causality: Models, Reasoning, and Inference ”

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 20: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

P(E |do(C ),F ) < P(E |do(¬C ),F )

P(E |do(C ),¬F ) < P(E |do(¬C ),¬F )

Also: P(F |do(C )) = P(F |do(¬C )) = P(F ). Thus:

P(E |do(C )) = P(E |do(C ),F )P(F ) + P(E |do(C ),¬F )P(¬F )

P(E |do(¬C )) = P(E |do(¬C ),F )P(F ) + P(E |do(¬C ),¬F )P(¬F )

This impliesP(E |do(C )) < P(E |do(¬C ))

.And the paradox is gone! Pearl.: “Causality: Models, Reasoning, and Inference ”

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 21: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

��������

������

Bottou et al.: “Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising”, JMLR 2013

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 22: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Given: graph and P.

X1 = f1(X3,N1)

X2 = f2(X1,N2) X2 = f2(X1, N2)

X3 = f3(N3)

X4 = f4(X2,X3,N4)

• Ni jointly independent

• G0 has no cycles

X4

X2 X3

X1

click

main line user intention

user dataG0

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 23: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Given: graph and P. We (RL, IPW [Horv.-Th.], ...) can then compute P.

X1 = f1(X3,N1)

X2 = f2(X1,N2) X2 = f2(X1, N2)

X3 = f3(N3)

X4 = f4(X2,X3,N4)

• Ni jointly independent

• G0 has no cycles

X4

X2 X3

X1

click

main line user intention

user dataG0

Ok, but what if there are hiddens?

Ok, but how do we find the graph?

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 24: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Given: graph and P. We (RL, IPW [Horv.-Th.], ...) can then compute P.

X1 = f1(X3,N1)

X2 = f2(X1,N2) X2 = f2(X1, N2)

X3 = f3(N3)

X4 = f4(X2,X3,N4)

• Ni jointly independent

• G0 has no cycles

X4

X2 X3

X1

click

main line user intention

user dataG0

Ok, but what if there are hiddens?

Ok, but how do we find the graph?

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 25: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Given: graph and P. We (RL, IPW [Horv.-Th.], ...) can then compute P.

X1 = f1(X3,N1)

X2 = f2(X1,N2) X2 = f2(X1, N2)

X3 = f3(N3)

X4 = f4(X2,X3,N4)

• Ni jointly independent

• G0 has no cycles

X4

X2 X3

X1

click

main line user intention

user dataG0

Ok, but what if there are hiddens?

Ok, but how do we find the graph?

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 26: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

Reichenbach: (common cause principle) Assume that X ⊥⊥� Y . Then

X causes Y ,

Y causes X ,

there is a hidden common cause or

combination of the above.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 27: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

CM ⊥⊥�NL

Quinn et al: “the strength of the association . . . does suggest that theabsence of a daily period of darkness during childhood is a potentialprecipitating factor in the development of myopia”Quinn, Shin, Maguire, Stone: “Myopia and ambient lighting at night”, Nature 1999

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 28: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

CM ⊥⊥�NL

Quinn et al: “the strength of the association . . . does suggest that theabsence of a daily period of darkness during childhood is a potentialprecipitating factor in the development of myopia”Quinn, Shin, Maguire, Stone: “Myopia and ambient lighting at night”, Nature 1999

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 29: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

It turned out that NL ⊥⊥ CM |PM and therefore ...

NL CM

PM

Zadnik, Jones, Irvin, Kleinstein, Manny, Shin, Mutti: “Vision: Myopia and ambient night-time lighting”, Nature 2000

Gwiazda, Ong, Held, Thorn: “Vision: Myopia and ambient night-time lighting”, Nature 2000

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 30: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

It turned out that NL ⊥⊥ CM |PM and therefore ...

NL CM

PM

Zadnik, Jones, Irvin, Kleinstein, Manny, Shin, Mutti: “Vision: Myopia and ambient night-time lighting”, Nature 2000

Gwiazda, Ong, Held, Thorn: “Vision: Myopia and ambient night-time lighting”, Nature 2000

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 31: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

W X Y Z

Some definitions:

path - sequence of edges, regardless of direction e.g. W, X, Y, Z

collider - node with > 1 arrows pointing to it e.g. X

unblocked path - a path not traversing collider edges e.g. X, Y, Z

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 32: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

W X Y Z

Some definitions:

path - sequence of edges, regardless of direction e.g. W, X, Y, Z

collider - node with > 1 arrows pointing to it e.g. X

unblocked path - a path not traversing collider edges e.g. X, Y, Z

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 33: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

W X Y Z

Some definitions:

path - sequence of edges, regardless of direction e.g. W, X, Y, Z

collider - node with > 1 arrows pointing to it e.g. X

unblocked path - a path not traversing collider edges e.g. X, Y, Z

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 34: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

W X Y Z

Some definitions:

path - sequence of edges, regardless of direction e.g. W, X, Y, Z

collider - node with > 1 arrows pointing to it e.g. X

unblocked path - a path not traversing collider edges e.g. X, Y, Z

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 35: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

W X Y Z

Nodes a and b are d-connected if there is an unblocked path betweenthem e.g. X and Z are, W and Y are not. W and Y are d-separated.

Nodes a and b are d-connected given a set of nodes C, if there existsan unblocked path between a and b containing no node in C.

Conditioning on a collider, or one of its descendants, unblocks thepaths that the collider previously blocked.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 36: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

W X Y Z

Nodes a and b are d-connected if there is an unblocked path betweenthem e.g. X and Z are, W and Y are not. W and Y are d-separated.

Nodes a and b are d-connected given a set of nodes C, if there existsan unblocked path between a and b containing no node in C.

Conditioning on a collider, or one of its descendants, unblocks thepaths that the collider previously blocked.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 37: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

W X Y Z

Nodes a and b are d-connected if there is an unblocked path betweenthem e.g. X and Z are, W and Y are not. W and Y are d-separated.

Nodes a and b are d-connected given a set of nodes C, if there existsan unblocked path between a and b containing no node in C.

Conditioning on a collider, or one of its descendants, unblocks thepaths that the collider previously blocked.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 38: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

A and B are d-separated by nodes C =⇒ A ⊥⊥ B|C .

But... we want to learn the graph from conditional independence tests.So what can we say about the converse?

Faithfulness: Distribution is faithful to the graph if whenever A and B arenot d-separated by C, then A ⊥⊥� B|C .

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 39: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

A and B are d-separated by nodes C =⇒ A ⊥⊥ B|C .

But... we want to learn the graph from conditional independence tests.So what can we say about the converse?

Faithfulness: Distribution is faithful to the graph if whenever A and B arenot d-separated by C, then A ⊥⊥� B|C .

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 40: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

A and B are d-separated by nodes C =⇒ A ⊥⊥ B|C .

But... we want to learn the graph from conditional independence tests.So what can we say about the converse?

Faithfulness: Distribution is faithful to the graph if whenever A and B arenot d-separated by C, then A ⊥⊥� B|C .

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 41: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

P(X1,...,X4)

X1 ⊥⊥ X2

X2 ⊥⊥ X3

X1 ⊥⊥ X4 | {X3}X1 ⊥⊥ X2 | {X3}X2 ⊥⊥ X3 | {X1}

X4

X2 X3

X1G

X4 = f4(X3,N4)

X3 = f3(X1,N3)

X2 = f2(N2)

X1 = f1(N1)

Ni jointly independent

independence

tests

G′

G′′

Faithfuln.Markov

unique? trivial

Method: IC (Pearl 2009); PC, FCI (Spirtes et al., 2000)

1 Find all (cond.) independences from the data.

2 Select the DAG(s) that corresponds to these independences.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 42: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

P(X1,...,X4)

X1 ⊥⊥ X2

X2 ⊥⊥ X3

X1 ⊥⊥ X4 | {X3}X1 ⊥⊥ X2 | {X3}X2 ⊥⊥ X3 | {X1}

X4

X2 X3

X1G

X4 = f4(X3,N4)

X3 = f3(X1,N3)

X2 = f2(N2)

X1 = f1(N1)

Ni jointly independent

independence

tests

G′

G′′

Faithfuln.Markov

unique? trivial

Method: IC (Pearl 2009); PC, FCI (Spirtes et al., 2000)

1 Find all (cond.) independences from the data.

2 Select the DAG(s) that corresponds to these independences.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 43: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

P(X1,...,X4)

X1 ⊥⊥ X2

X2 ⊥⊥ X3

X1 ⊥⊥ X4 | {X3}X1 ⊥⊥ X2 | {X3}X2 ⊥⊥ X3 | {X1}

X4

X2 X3

X1G

X4 = f4(X3,N4)

X3 = f3(X1,N3)

X2 = f2(N2)

X1 = f1(N1)

Ni jointly independent

independence

tests

G′

G′′

Faithfuln.Markov

unique? trivial

Method: IC (Pearl 2009); PC, FCI (Spirtes et al., 2000)

1 Find all (cond.) independences from the data.

2 Select the DAG(s) that corresponds to these independences.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 44: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

P(X1,...,X4)

X1 ⊥⊥ X2

X2 ⊥⊥ X3

X1 ⊥⊥ X4 | {X3}X1 ⊥⊥ X2 | {X3}X2 ⊥⊥ X3 | {X1}

X4

X2 X3

X1G

X4 = f4(X3,N4)

X3 = f3(X1,N3)

X2 = f2(N2)

X1 = f1(N1)

Ni jointly independent

independence

tests

G′

G′′

Faithfuln.Markov

unique? trivial

Method: IC (Pearl 2009); PC, FCI (Spirtes et al., 2000)

1 Find all (cond.) independences from the data. Be smart.

2 Select the DAG(s) that corresponds to these independences.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 45: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

X Y Z

X Y Z

X Y Z

Conditional independence is a symmetric quantity, so there are a bunch ofDAGs may be valid. All have X ⊥⊥ Z |Y .

The set of admissible DAGs based on the conditional independence testsare called Markov equivalent.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 46: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

X Y Z

X Y Z

X Y Z

Conditional independence is a symmetric quantity, so there are a bunch ofDAGs may be valid. All have X ⊥⊥ Z |Y .

The set of admissible DAGs based on the conditional independence testsare called Markov equivalent.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 47: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

X Y Z

X Y Z

X Y Z

All the above have X ⊥⊥ Z |Y .

The only X, Y, Z configuration where X ⊥⊥� Z |Y , is when Y is a collider:

X Y Z

Here, X,Y,Z form a v-structure (collider with independent causes).

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 48: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

X Y Z

X Y Z

X Y Z

All the above have X ⊥⊥ Z |Y .

The only X, Y, Z configuration where X ⊥⊥� Z |Y , is when Y is a collider:

X Y Z

Here, X,Y,Z form a v-structure (collider with independent causes).

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 49: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

X Y Z

X Y Z

X Y Z

All the above have X ⊥⊥ Z |Y .

The only X, Y, Z configuration where X ⊥⊥� Z |Y , is when Y is a collider:

X Y Z

Here, X,Y,Z form a v-structure (collider with independent causes).

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 50: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

The PC algorithm: Step 1

Form the skeleton of the graph

Start with a fully connected graph

Remove edges X-Y where X ⊥⊥ Y

Remove edges X-Y for which there is a neighbour Z 6= Y of X suchthat X ⊥⊥ Y |ZRemove edges X-Y for which there are 2 neighbours Z1,Z2 6= Y of Xsuch that X ⊥⊥ Y |Z1,Z2

...

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 51: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

The PC algorithm: Step 1

Form the skeleton of the graph

Start with a fully connected graph

Remove edges X-Y where X ⊥⊥ Y

Remove edges X-Y for which there is a neighbour Z 6= Y of X suchthat X ⊥⊥ Y |ZRemove edges X-Y for which there are 2 neighbours Z1,Z2 6= Y of Xsuch that X ⊥⊥ Y |Z1,Z2

...

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 52: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

The PC algorithm: Step 1

Form the skeleton of the graph

Start with a fully connected graph

Remove edges X-Y where X ⊥⊥ Y

Remove edges X-Y for which there is a neighbour Z 6= Y of X suchthat X ⊥⊥ Y |Z

Remove edges X-Y for which there are 2 neighbours Z1,Z2 6= Y of Xsuch that X ⊥⊥ Y |Z1,Z2

...

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 53: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

The PC algorithm: Step 1

Form the skeleton of the graph

Start with a fully connected graph

Remove edges X-Y where X ⊥⊥ Y

Remove edges X-Y for which there is a neighbour Z 6= Y of X suchthat X ⊥⊥ Y |ZRemove edges X-Y for which there are 2 neighbours Z1,Z2 6= Y of Xsuch that X ⊥⊥ Y |Z1,Z2

...

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 54: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

The PC algorithm: Step 2

Find v-structures

Search over triples X-Y-Z, where X and Z are not connected

Since X and Z not connected, there is a separating set S s.t.X ⊥⊥ Z |S .

Is Y in the set S? If not, then X-Y-Z must form a v-structure.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 55: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Independence-based Methods

The PC algorithm: Step 3

Direct further edges

We cannot have cycles and have found all v-structures.Use this knowledge to fill in other arrows.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 56: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

Mooij, JP, Janzing, Zscheischler, Scholkopf: “Distinguishing cause from effect using observational data: methods and

benchmarks”, submitted 2014

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 57: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

Assume P(X1, . . . ,X4) has been generated by

X1 = f1(X3,N1)

X2 = N2

X3 = f3(X2,N3)

X4 = f4(X2,X3,N4)

• Ni jointly independent

• G0 has no cycles

X4

X2 X3

X1G0

Structural equation model.Can the DAG be recovered from P(X1, . . . ,X4)?

No.JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014

P. Buhlmann, JP, J. Ernest: CAM: Causal add. models, high-dim. order search and penalized regr., Annals of Statistics 2014

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 58: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

Assume P(X1, . . . ,X4) has been generated by

X1 = f1(X3,N1)

X2 = N2

X3 = f3(X2,N3)

X4 = f4(X2,X3,N4)

• Ni jointly independent

• G0 has no cycles

X4

X2 X3

X1G0

Structural equation model.Can the DAG be recovered from P(X1, . . . ,X4)? No.JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014

P. Buhlmann, JP, J. Ernest: CAM: Causal add. models, high-dim. order search and penalized regr., Annals of Statistics 2014

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 59: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Intuition: Y = f (X ,N) with discrete N taking d values N1, . . . ,Nd .

X Y

N

Noise values could switch between d different mechanisms f1, . . . , fd !

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 60: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

Assume P(X1, . . . ,X4) has been generated by

X1 = f1(X3) + N1

X2 = N2

X3 = f3(X2) + N3

X4 = f4(X2,X3) + N4

• Ni ∼ N (0, σ2i ) jointly independent

• G0 has no cycles

X4

X2 X3

X1G0

Additive noise model with Gaussian noise.Can the DAG be recovered from P(X1, . . . ,X4)? Yes iff fi nonlinear.JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014

P. Buhlmann, JP, J. Ernest: CAM: Causal add. models, high-dim. order search and penalized regr., Annals of Statistics 2014

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 61: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

Consider a distribution generated by

Y = f (X ) + NY

with NY ,Xind∼ N

X Y

Then, if f is nonlinear, there is no

X = g(Y ) + MX

with MX ,Yind∼ N

X Y

JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 62: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

Consider a distribution generated by

Y = f (X ) + NY

with NY ,Xind∼ N

X Y

Then, if f is nonlinear, there is no

X = g(Y ) + MX

with MX ,Yind∼ N

X Y

JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 63: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

Consider a distribution corresponding to

Y = X 3 + NY

with NY ,Xind∼ N

X Y

with

X ∼ N (1, 0.52)

NY ∼ N (0, 0.42)

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 64: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

−0.5 0.0 0.5 1.0 1.5 2.0 2.5

05

1015

X

Y

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 65: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

−0.5 0.0 0.5 1.0 1.5 2.0 2.5

05

1015

X

Y

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 66: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

−0.5 0.0 0.5 1.0 1.5 2.0 2.5

05

1015

X

Y

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 67: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

−0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4

05

1015

gam(X ~ s(Y))$residuals

Y

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 68: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

Let P(X1, . . . ,Xp) be generated by an ...

conditions identif.structural equation model: Xi = fi (XPAi

,Ni ) - 7

additive noise model: Xi = fi (XPAi) + Ni nonlin. fct. 3

causal additive model: Xi =∑

k∈PAifik (Xk ) + Ni nonlin. fct. 3

linear Gaussian model: Xi =∑

k∈PAiβik Xk + Ni linear fct. 7

.

(results hold for Gaussian noise)

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 69: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

Find causal order + variable selectionRESIT Algorithm

But finding best causal order is intractable for many nodes!

# DAGs with 13 nodes:18676600744432035186664816926721

# orderings with 13 nodes:6227020800

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 70: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Restricted Structural Equation Models

Find causal order + variable selectionRESIT AlgorithmBut finding best causal order is intractable for many nodes!

# DAGs with 13 nodes:18676600744432035186664816926721

# orderings with 13 nodes:6227020800

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 71: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Real Data: cause-effect pairs

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

Significant

Not significant

Decision rate (%)

Accura

cy (

%)

IGCI

LiNGaM

Additive Noise

PNL

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 72: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Real Data: Time Series

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 73: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Real Data: Time Series

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 74: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Back to basics... linear regression.

y = βx + u

X Y

U

Assumption: no dependence between x and u.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 75: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Consider regressing GDP (y) on years of schooling (x).The error term includes terms which embody all factors other thanschooling e.g. ability.

Suppose person has high u from high (unobserved) ability.Directly increases y , and possibly indirectly via x too!

X Y

U

OLS regression will give β which includes the dependence of x on y directlyAND via u. This does not tell us what happens when we intervene on x .

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 76: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Consider regressing GDP (y) on years of schooling (x).The error term includes terms which embody all factors other thanschooling e.g. ability.

Suppose person has high u from high (unobserved) ability.Directly increases y , and possibly indirectly via x too!

X Y

U

OLS regression will give β which includes the dependence of x on y directlyAND via u. This does not tell us what happens when we intervene on x .

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 77: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Consider regressing GDP (y) on years of schooling (x).The error term includes terms which embody all factors other thanschooling e.g. ability.

Suppose person has high u from high (unobserved) ability.Directly increases y , and possibly indirectly via x too!

X Y

U

OLS regression will give β which includes the dependence of x on y directlyAND via u. This does not tell us what happens when we intervene on x .

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 78: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Consider regressing GDP (y) on years of schooling (x).The error term includes terms which embody all factors other thanschooling e.g. ability.

Suppose person has high u from high (unobserved) ability.Directly increases y , and possibly indirectly via x too!

X Y

U

OLS regression will give β which includes the dependence of x on y directlyAND via u. This does not tell us what happens when we intervene on x .

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 79: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

The problem is the endogeneity of x .We need to generate only exogenous variation in x .

Ideally, we would carry out experiments to control the variations.Often infeasible e.g. hard to control for ability and many other factors.

This is where instrumental variables come in.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 80: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

The problem is the endogeneity of x .We need to generate only exogenous variation in x .

Ideally, we would carry out experiments to control the variations.Often infeasible e.g. hard to control for ability and many other factors.

This is where instrumental variables come in.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 81: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

The problem is the endogeneity of x .We need to generate only exogenous variation in x .

Ideally, we would carry out experiments to control the variations.Often infeasible e.g. hard to control for ability and many other factors.

This is where instrumental variables come in.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 82: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

z is an instrumental variable if (i) it is uncorrelated with u and (ii) iscorrelated with x .

Z X Y

U

Condition (i) excludes z from being a regressor.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 83: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

z is an instrumental variable if (i) it is uncorrelated with u and (ii) iscorrelated with x .

Z X Y

U

Condition (i) excludes z from being a regressor.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 84: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Two stage least squares:

(1) Isolate the part of x uncorrelated with u by regressing x on z .

xi = π0 + π1zi + εi

Compute xi = π0 + π1zi .

(2) Now regress y on xyi = βxi + τi

βTSLS is a consistent estimator of β.

Can be extended to multiple dimensions and to non-linear models.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 85: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Two stage least squares:

(1) Isolate the part of x uncorrelated with u by regressing x on z .

xi = π0 + π1zi + εi

Compute xi = π0 + π1zi .

(2) Now regress y on xyi = βxi + τi

βTSLS is a consistent estimator of β.

Can be extended to multiple dimensions and to non-linear models.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 86: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Two stage least squares:

(1) Isolate the part of x uncorrelated with u by regressing x on z .

xi = π0 + π1zi + εi

Compute xi = π0 + π1zi .

(2) Now regress y on xyi = βxi + τi

βTSLS is a consistent estimator of β.

Can be extended to multiple dimensions and to non-linear models.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 87: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Two stage least squares:

(1) Isolate the part of x uncorrelated with u by regressing x on z .

xi = π0 + π1zi + εi

Compute xi = π0 + π1zi .

(2) Now regress y on xyi = βxi + τi

βTSLS is a consistent estimator of β.

Can be extended to multiple dimensions and to non-linear models.

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 88: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Example 1:

Estimate change in market demand (y) due to exogenous changes tomarket price (x).

Demand clearly depends on price. But price is not exogenously available.Suitable instrument?

z needs to directly effect price, but not demand... how about marketsupply

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 89: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Example 1:

Estimate change in market demand (y) due to exogenous changes tomarket price (x).

Demand clearly depends on price. But price is not exogenously available.Suitable instrument?

z needs to directly effect price, but not demand... how about marketsupply

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 90: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Example 1:

Estimate change in market demand (y) due to exogenous changes tomarket price (x).

Demand clearly depends on price. But price is not exogenously available.Suitable instrument?

z needs to directly effect price, but not demand... how about marketsupply

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 91: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Example 2:

Estimate change in GDP (y) due to exogenous change in years ofschooling (x).

Popular choice for z is proximity to university.People living far from university are likely to stay in school for a shortertime (condition (2) satisfied).

Condition (1) likely satisfied, but can argue people far from university aremore likely to be in lower wage areas, which would affect y .

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 92: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Example 2:

Estimate change in GDP (y) due to exogenous change in years ofschooling (x).

Popular choice for z is proximity to university.People living far from university are likely to stay in school for a shortertime (condition (2) satisfied).

Condition (1) likely satisfied, but can argue people far from university aremore likely to be in lower wage areas, which would affect y .

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 93: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Example 2:

Estimate change in GDP (y) due to exogenous change in years ofschooling (x).

Popular choice for z is proximity to university.People living far from university are likely to stay in school for a shortertime (condition (2) satisfied).

Condition (1) likely satisfied, but can argue people far from university aremore likely to be in lower wage areas, which would affect y .

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 94: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Instrumental Variables

Summary

Instrumental variables are useful for measuring how y changes whenyou intervene on x

Valid IVs are difficult to find

βTSLS can be very inefficient

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 95: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Key idea:P(Y |PAY ) remains invariant if the struct. equ. for Y does not change.

X1 = f1(X3,N1)

Y = f2(X1,N2)

X3 = f3(N3)

X4 = f4(Y ,X3,N4)

• Ni jointly independent

• G0 has no cycles

X4

Y X3

X1G0

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 96: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Key idea:P(Y |PAY ) remains invariant if the struct. equ. for Y does not change.

X1 = f1(N1)

Y = f2(X1,N2)

X3 = f3(N3)

X4 = f4(Y ,X3,N4)

• Ni jointly independent

• G0 has no cycles

X4

Y X3

X1G0

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 97: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Key idea:P(Y |PAY ) remains invariant if the struct. equ. for Y does not change.

X1 = f1(X3,N1)

Y = f2(X1,N2)

X3 = f3(N3)

X4 = f4(Y ,X3, N4)

• Ni jointly independent

• G0 has no cycles

X4

Y X3

X1G0

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 98: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Key idea:P(Y |PAY ) remains invariant if the struct. equ. for Y does not change.

X1 = f1(N1)

Y = f2(X1,N2)

X3 = f3(X1,X4, N3)

X4 = f4(Y , N4)

• Ni jointly independent

• G0 has no cycles

X4

Y X3

X1G0

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 99: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Observe target Y and p covariates X in different “environments” e ∈ E :

(X e ,Y e) ∼ Pe .

Idea:

1 Look for sets of predictors that lead to invariant prediction.

2 The variables that are in all of those sets are called S(E).

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 100: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Observe target Y and p covariates X in different “environments” e ∈ E :

(X e ,Y e) ∼ Pe .

Idea:

1 Look for sets of predictors that lead to invariant prediction.

2 The variables that are in all of those sets are called S(E).

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 101: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Example 1:S∗ = {X1,X2}, E = {1}

X3

X1 X2

Y

sleep

chocolate mountains

mood

e = 1: observational

S(E) = ∅

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 102: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Example 1:S∗ = {X1,X2}, E = {1}

X3

X1 X2

Y

sleep

chocolate mountains

mood

e = 1: observational

S(E) = ∅

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 103: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Example 2:S∗ = {X1,X2}, E = {1, 2}

X3

X1 X2

Y

sleep

chocolate mountains

mood

X3

X1 X2

Y

sleep

chocolate mountains

mood

e = 1: observational e = 2: intervention on X1

S(E) = {X1}

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 104: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Example 2:S∗ = {X1,X2}, E = {1, 2}

X3

X1 X2

Y

sleep

chocolate mountains

mood

X3

X1 X2

Y

sleep

chocolate mountains

mood

e = 1: observational e = 2: intervention on X1

S(E) = {X1}

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 105: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Example 3:S∗ = {X1,X2}, E = {1, 2}

X3

X1 X2

Y

sleep

chocolate mountains

mood

X3

X1 X2

Y

sleep

chocolate mountains

mood

e = 1: observational e = 2: intervention on X1

S(E) = {X1,X2}

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 106: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Theorem1 No mistakes:

S(E) ⊆ S∗.

2 No chances with one environment:

#E = 1 =⇒ S(E) = ∅.

3 Seeing more environments helps:

S(E1) ⊆ S(E2) if E1 ⊆ E2

4 Sufficient conditions for S(E) = S∗:

a) many “generic” interventions: on each node except Y ORb) single “generic” intervention: on a “youngest” parent of Y that is

directly connected to all other parents of Y .

JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 107: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Theorem1 No mistakes:

S(E) ⊆ S∗.

2 No chances with one environment:

#E = 1 =⇒ S(E) = ∅.

3 Seeing more environments helps:

S(E1) ⊆ S(E2) if E1 ⊆ E2

4 Sufficient conditions for S(E) = S∗:

a) many “generic” interventions: on each node except Y ORb) single “generic” intervention: on a “youngest” parent of Y that is

directly connected to all other parents of Y .

JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 108: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Theorem1 No mistakes:

S(E) ⊆ S∗.

2 No chances with one environment:

#E = 1 =⇒ S(E) = ∅.

3 Seeing more environments helps:

S(E1) ⊆ S(E2) if E1 ⊆ E2

4 Sufficient conditions for S(E) = S∗:

a) many “generic” interventions: on each node except Y ORb) single “generic” intervention: on a “youngest” parent of Y that is

directly connected to all other parents of Y .

JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 109: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Theorem1 No mistakes:

S(E) ⊆ S∗.

2 No chances with one environment:

#E = 1 =⇒ S(E) = ∅.

3 Seeing more environments helps:

S(E1) ⊆ S(E2) if E1 ⊆ E2

4 Sufficient conditions for S(E) = S∗:

a) many “generic” interventions: on each node except Y ORb) single “generic” intervention: on a “youngest” parent of Y that is

directly connected to all other parents of Y .

JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 110: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Method for finite samples:1 Test whether S leads to invariant prediction.2 S(E) contains the variables that are in all those S .

Theorem

Assume that the probability of falsely rejecting S∗ is less than α. Then

P(

S(E) ⊆ S∗)≥ 1− α

possible test 1: test whether the regression models are the same forgroup e and −e (Chow, 1960) (inversion of cov. matrix of Rese)possible test 2: linear regression on pooled data; test whether Rese

of group e have same mean and variance as Res−e .(easy) tricks for computations and high-dimensions.

JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 111: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Method for finite samples:1 Test whether S leads to invariant prediction.2 S(E) contains the variables that are in all those S .

Theorem

Assume that the probability of falsely rejecting S∗ is less than α. Then

P(

S(E) ⊆ S∗)≥ 1− α

possible test 1: test whether the regression models are the same forgroup e and −e (Chow, 1960) (inversion of cov. matrix of Rese)possible test 2: linear regression on pooled data; test whether Rese

of group e have same mean and variance as Res−e .(easy) tricks for computations and high-dimensions.

JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 112: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Method for finite samples:1 Test whether S leads to invariant prediction.2 S(E) contains the variables that are in all those S .

Theorem

Assume that the probability of falsely rejecting S∗ is less than α. Then

P(

S(E) ⊆ S∗)≥ 1− α

possible test 1: test whether the regression models are the same forgroup e and −e (Chow, 1960) (inversion of cov. matrix of Rese)possible test 2: linear regression on pooled data; test whether Rese

of group e have same mean and variance as Res−e .(easy) tricks for computations and high-dimensions.

JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 113: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Method for finite samples:1 Test whether S leads to invariant prediction.2 S(E) contains the variables that are in all those S .

Theorem

Assume that the probability of falsely rejecting S∗ is less than α. Then

P(

S(E) ⊆ S∗)≥ 1− α

possible test 1: test whether the regression models are the same forgroup e and −e (Chow, 1960) (inversion of cov. matrix of Rese)

possible test 2: linear regression on pooled data; test whether Rese

of group e have same mean and variance as Res−e .(easy) tricks for computations and high-dimensions.

JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 114: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Method for finite samples:1 Test whether S leads to invariant prediction.2 S(E) contains the variables that are in all those S .

Theorem

Assume that the probability of falsely rejecting S∗ is less than α. Then

P(

S(E) ⊆ S∗)≥ 1− α

possible test 1: test whether the regression models are the same forgroup e and −e (Chow, 1960) (inversion of cov. matrix of Rese)possible test 2: linear regression on pooled data; test whether Rese

of group e have same mean and variance as Res−e .

(easy) tricks for computations and high-dimensions.

JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 115: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Method for finite samples:1 Test whether S leads to invariant prediction.2 S(E) contains the variables that are in all those S .

Theorem

Assume that the probability of falsely rejecting S∗ is less than α. Then

P(

S(E) ⊆ S∗)≥ 1− α

possible test 1: test whether the regression models are the same forgroup e and −e (Chow, 1960) (inversion of cov. matrix of Rese)possible test 2: linear regression on pooled data; test whether Rese

of group e have same mean and variance as Res−e .(easy) tricks for computations and high-dimensions.

JP, P. Buhlmann, N. Meinshausen: Causal inference using invariant prediction: identification and conf. interv., arXiv 1501.01332

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 116: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Real data: genetic perturbation experiments for yeast (Kemmeren et al.,2014)

p = 6170 genes

nobs = 160 wild-types

nint = 1479 gene deletions (targets known)

true hits: ≈ 9% of pairs

our method: E = {obs, int}

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 117: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Real data: genetic perturbation experiments for yeast (Kemmeren et al.,2014)

p = 6170 genes

nobs = 160 wild-types

nint = 1479 gene deletions (targets known)

true hits: ≈ 9% of pairs

our method: E = {obs, int}

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 118: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

Real data: genetic perturbation experiments for yeast (Kemmeren et al.,2014)

p = 6170 genes

nobs = 160 wild-types

nint = 1479 gene deletions (targets known)

true hits: ≈ 9% of pairs

our method: E = {obs, int}Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 119: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

ACTIVITY GENE 5954

AC

TIV

ITY

GE

NE

471

0

−1.0 −0.5 0.0 0.5

−1.

0−

0.5

0.0

0.5

observational training data

ACTIVITY GENE 5954−1.0 −0.5 0.0 0.5

interventional training data(interv. on genes other than 5954 and 4710)

ACTIVITY GENE 5954

AC

TIV

ITY

GE

NE

471

0

−5 −4 −3 −2 −1 0 1

−5

−4

−3

−2

−1

01

interventional test data point(intervention on gene 5954)

most significant pair

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 120: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

ACTIVITY GENE 3729

AC

TIV

ITY

GE

NE

373

0

−0.5 0.0 0.5 1.0

−0.

50.

00.

51.

0

observational training data

ACTIVITY GENE 3729−0.5 0.0 0.5 1.0

interventional training data(interv. on genes other than 3729 and 3730)

ACTIVITY GENE 3729

AC

TIV

ITY

GE

NE

373

0

−4 −3 −2 −1 0 1 2

−4

−3

−2

−1

01

2 interventional test data point(intervention on gene 3729)

2nd most significant pair

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 121: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

ACTIVITY GENE 3672

AC

TIV

ITY

GE

NE

147

5

−0.5 0.0 0.5 1.0 1.5−0.

50.

00.

51.

01.

5

observational training data

ACTIVITY GENE 3672−0.5 0.0 0.5 1.0 1.5

interventional training data(interv. on genes other than 3672 and 1475)

ACTIVITY GENE 3672

AC

TIV

ITY

GE

NE

147

5

−3 −2 −1 0 1 2

−3

−2

−1

01

2 interventional test data point(intervention on gene 3672)

3rd most significant pair

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 122: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Observing data from different environments

proposedGIES IDA

marginal corr. randommethod observ. pooled guessing

# of true6 2 2 1 2

2 (95% quantile)positives 3 (99% quantile)

(out of 8) 4 (99.9% quantile)

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 123: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Summary:

: additive noise (single environment)

: independence-based methods (single environment)

: invariant prediction; control family wise error rate

Future directions (theory):

extend to estimation of graphs

combine with 1 or 2

nonlinear models

finite sample guarantees

Future work (methodology):

domain adaption

instrumental variables I X Y

W

Thank you!

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 124: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Summary:

: additive noise (single environment)

: independence-based methods (single environment)

: invariant prediction; control family wise error rate

Future directions (theory):

extend to estimation of graphs

combine with 1 or 2

nonlinear models

finite sample guarantees

Future work (methodology):

domain adaption

instrumental variables I X Y

W

Thank you!

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015

Page 125: Jonas Peters, Mateo Rojas-Carulla, Amar Shah Machine ...cbl.eng.cam.ac.uk › ... › ReadingGroup › CausalInference.pdf · Pearl.: \Causality: Models, Reasoning, and Inference

Summary:

: additive noise (single environment)

: independence-based methods (single environment)

: invariant prediction; control family wise error rate

Future directions (theory):

extend to estimation of graphs

combine with 1 or 2

nonlinear models

finite sample guarantees

Future work (methodology):

domain adaption

instrumental variables I X Y

W

Thank you!

Jonas Peters, Mateo Rojas-Carulla, Amar Shah () Causal Inference 23 April 2015