1causality & mdl causal models as minimal descriptions of multivariate systems jan lemeire june...

20
1 Causality & MDL Causal Models as Minimal Descriptions of Multivariate Systems Jan Lemeire June 15 th 2006

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

1Causality & MDL

Causal Models as Minimal Descriptions of Multivariate Systems

Jan LemeireJune 15th 2006

2Causality & MDL

Pag.

  What can be learnt about the world from observations?

  We have to look for regularities  & model them

3Causality & MDL

Pag.

MDL-approach to Learning

Occam’s Razor“Among equivalent models

choose the simplest one.”

Minimum Description Length (MDL)“Select model that describes data with minimal #bits.”model = shortest program that outputs datalength of program = Kolmogorov Complexity

Learning = finding regularities = compression

4Causality & MDL

Pag.

Randomness vs. Regularity

0110001101011010101 random string=incompressible=maximal information

010101010101010101regularity of repetition allows compression

Separation by theTwo-part code

Description length = L(model) + L(data | model)

regularities deviations

Meaningful information Individual-specific information

5Causality & MDL

Pag.

Model of Multivariate Systems

  Variables

Probabilistic model of joint distribution with minimal description length?

  Experimental data

6Causality & MDL

Pag.

1 variable Average code length = Shannon entropy of P(x)

Multiple variables With help of other, P(E| A…D) (CPD) Factorization

Mutual information decreases entropy of variable

7Causality & MDL

Pag.

Reduction of factorization complexity Bayesian Network

(A, B, C, D, E)

I. Conditional Independencies

(A, B, C, E, D)

Ordering 1 Ordering 2

8Causality & MDL

Pag.

II. Faithfulness

Joint Distribution Directed Acyclic Graph Conditional independencies d-separation

Theorem: if a faithful graph exists, it is the minimal factorization.

9Causality & MDL

Pag.

Definition through interventions

A B A B

do(A=a)

A B

do(A=a)

III. Causal Interpretation

10Causality & MDL

Pag.

Reductionism Causality = reductionism

Canonical representation: unique, minimal, independent

Building block = P(Xi|parentsi)Whole theory is based on modularity

like asymmetry of causality

Intervention = change of block

X1 X2

X3 X4

X5

X1 X2

X3 X4

X5

do(X3=a) =a

11Causality & MDL

Pag.

Ultimate motivation for causality

Model = canonical representation able to explain all regularities close to reality

Example taken from Spirtes, Glymour and Scheines 1993, Fig. 3-23

Reality Learnt

X Y

Z

R

BLACK BOX

12Causality & MDL

Pag.

X1

X2

X3

X4

X5

P(X1)P(X2|X1)P(X3|X1)P(X4|X1, X2)P(X5|X3, X4)

Meaningful information Accidental information

Incompressible Incompressible (random distribution)

Causal model is MDL of joint distribution if

13Causality & MDL

Pag.

  d-separation tells what we can expect from a causal model

  A Bayesian network with unrelated, random CPDs is faithful

Eg. D depends on C, unless a dependency in P(D|C,E) C E P(D| C, E)

T T 0.25 T F 0.75 F T 0.75 F F 0.25

C P(D| C)

T 0.5 F 0.5

C D

P(d1|c0,e0).P(e0)+ P(d1|c0,e1).P(e1)= P(d1|c1,e0).P(e0)+ P(d1|c1,e1).P(e1)

14Causality & MDL

Pag.

When do causal models become incorrect?

  Other regularities!

15Causality & MDL

Pag.

A. Lower-level regularities

  Compression of the distributions

X1

X2

X3

X4

X5

P(X1)P(X2|X1)P(X3|X1)P(X4|X1, X2)P(X5|X3, X4)

Meaningful information Accidental information

X1 X2 P(X4|X1, X2)

T T 0.75 T F 0.75 F T 0.75 F F 0.25

16Causality & MDL

Pag.

B. Better description form

  Pattern   in figure

random patterns -> distribution

Causal model??

  Other models are better

  Why? Complete symmetry among the variables

X1,1 X1,2 X1,3 X1,4

X2,1 X2,2 X2,3 X2,4

X3,1 X3,2 X3,3 X3,4

X4,1 X4,2 X4,3 X4,4

17Causality & MDL

Pag.

C. Interference with independencies

X

Y

VUX and Y independent

by cancellation of X→U → Y and X → V → Y

dependency of both paths = regularity

18Causality & MDL

Pag.

Violation of weak transitivity condition

One of the necessary conditions for faithfulness

R Y Y Zand R Z R Zor Y

19Causality & MDL

Pag.

Deterministic relations

X2

ZYX1

  Y=f(X1, X2)

Y becomes (unexpectedly) independent from Z conditioned on X1 and X2

~ violation of the intersection condition

Solution: augmented model- add regularity to model- adapt inference algorithms Z

Y

X

Learning algorithm: variables possibly contain equivalent information about another

Choose simplest relation

20Causality & MDL

Pag.

Conclusions

Interpretation of causality by the regularitiesCanonical, faithful representation‘Describe all regularities’Causality is just one type of regularity?

Occam’s Razor works Choice of simplest model models close to ‘reality’

but what is reality? Atomic description of regularities that we observe?

Papers, references and demos: http://parallel.vub.ac.be

X1 X2

X4 X5

X6

X3

X7