why and how to use random forestzeileis.pdfintroduction construction r functions variable importance...

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Why and how to use random forest

variable importance measures

(and how you shouldn’t)

Carolin Strobl (LMU Munchen) and Achim Zeileis (WU Wien)

[email protected]

useR! 2008, Dortmund

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Introduction

Random forests

I have become increasingly popular in, e.g., genetics and

the neurosciences

[imagine a long list of references here]

I can deal with “small n large p”-problems, high-order

interactions, correlated predictor variables

I are used not only for prediction, but also to assess

variable importance

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Introduction

Random forests

I have become increasingly popular in, e.g., genetics and

the neurosciences [imagine a long list of references here]

I can deal with “small n large p”-problems, high-order

interactions, correlated predictor variables

I are used not only for prediction, but also to assess

variable importance

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

(Small) random forest

Startp < 0.001

1

≤≤ 8 >> 8

n = 15y = (0.4, 0.6)

2Start

p < 0.001

3

≤≤ 14 >> 14

n = 34y = (0.882, 0.118)

4n = 32

y = (1, 0)

5

Startp < 0.001

1

≤≤ 12 >> 12

n = 38y = (0.711, 0.289)

2Number

p < 0.001

3

≤≤ 3 >> 3

n = 25y = (1, 0)

4n = 18

y = (0.889, 0.111)

5

Startp < 0.001

1

≤≤ 12 >> 12

Agep < 0.001

2

≤≤ 27 >> 27

n = 10y = (1, 0)

3Number

p < 0.001

4

≤≤ 4 >> 4

n = 14y = (0.357, 0.643)

5n = 9

y = (0.111, 0.889)

6

Startp < 0.001

7

≤≤ 13 >> 13

n = 11y = (0.818, 0.182)

8n = 37

y = (1, 0)

9

Startp < 0.001

1

≤≤ 8 >> 8

Startp < 0.001

2

≤≤ 1 >> 1

n = 9y = (0.778, 0.222)

3n = 13

y = (0.154, 0.846)

4

Startp < 0.001

5

≤≤ 12 >> 12

n = 12y = (0.833, 0.167)

6n = 47

y = (1, 0)

7

Startp < 0.001

1

≤≤ 8 >> 8

n = 13y = (0.308, 0.692)

2Age

p < 0.001

3

≤≤ 87 >> 87

n = 36y = (1, 0)

4Start

p < 0.001

5

≤≤ 13 >> 13

n = 16y = (0.75, 0.25)

6n = 16

y = (1, 0)

7

Numberp < 0.001

1

≤≤ 5 >> 5

Agep < 0.001

2

≤≤ 81 >> 81

n = 33y = (1, 0)

3Start

p < 0.001

4

≤≤ 12 >> 12

n = 13y = (0.385, 0.615)

5Start

p < 0.001

6

≤≤ 15 >> 15

n = 12y = (0.833, 0.167)

7n = 12

y = (1, 0)

8

n = 11y = (0.364, 0.636)

9

Startp < 0.001

1

≤≤ 12 >> 12

Agep < 0.001

2

≤≤ 81 >> 81

n = 20y = (0.85, 0.15)

3n = 16

y = (0.188, 0.812)

4

Startp < 0.001

5

≤≤ 13 >> 13

n = 11y = (0.818, 0.182)

6n = 34

y = (1, 0)

7

Startp < 0.001

1

≤≤ 12 >> 12

Agep < 0.001

2

≤≤ 71 >> 71

n = 15y = (0.667, 0.333)

3n = 17

y = (0.235, 0.765)

4

Startp < 0.001

5

≤≤ 14 >> 14

n = 17y = (0.882, 0.118)

6n = 32

y = (1, 0)

7

Startp < 0.001

1

≤≤ 12 >> 12

Agep < 0.001

2

≤≤ 68 >> 68

Numberp < 0.001

3

≤≤ 4 >> 4

n = 11y = (1, 0)

4n = 9

y = (0.556, 0.444)

5

n = 12y = (0.25, 0.75)

6

n = 49y = (1, 0)

7

Startp < 0.001

1

≤≤ 12 >> 12

Agep < 0.001

2

≤≤ 18 >> 18

n = 10y = (0.9, 0.1)

3Number

p < 0.001

4

≤≤ 4 >> 4

n = 12y = (0.417, 0.583)

5n = 10

y = (0.2, 0.8)

6

Numberp < 0.001

7

≤≤ 3 >> 3

n = 28y = (1, 0)

8n = 21

y = (0.952, 0.048)

9

Startp < 0.001

1

≤≤ 8 >> 8

Startp < 0.001

2

≤≤ 3 >> 3

n = 12y = (0.667, 0.333)

3n = 14

y = (0.143, 0.857)

4

Agep < 0.001

5

≤≤ 136 >> 136

n = 47y = (1, 0)

6n = 8

y = (0.75, 0.25)

7

Startp < 0.001

1

≤≤ 12 >> 12

n = 28y = (0.607, 0.393)

2Start

p < 0.001

3

≤≤ 14 >> 14

n = 21y = (0.905, 0.095)

4n = 32

y = (1, 0)

5

Startp < 0.001

1

≤≤ 1 >> 1

n = 8y = (0.375, 0.625)

2Number

p < 0.001

3

≤≤ 4 >> 4

Agep < 0.001

4

≤≤ 125 >> 125

n = 31y = (1, 0)

5n = 11

y = (0.818, 0.182)

6

n = 31y = (0.806, 0.194)

7

Startp < 0.001

1

≤≤ 14 >> 14

Agep < 0.001

2

≤≤ 71 >> 71

n = 15y = (0.933, 0.067)

3Start

p < 0.001

4

≤≤ 12 >> 12

n = 16y = (0.375, 0.625)

5n = 15

y = (0.733, 0.267)

6

n = 35y = (1, 0)

7

Numberp < 0.001

1

≤≤ 6 >> 6

Numberp < 0.001

2

≤≤ 3 >> 3

Startp < 0.001

3

≤≤ 13 >> 13

n = 10y = (0.8, 0.2)

4n = 24

y = (1, 0)

5

n = 37y = (0.865, 0.135)

6

n = 10y = (0.5, 0.5)

7

Startp < 0.001

1

≤≤ 8 >> 8

n = 18y = (0.5, 0.5)

2Start

p < 0.001

3

≤≤ 12 >> 12

n = 18y = (0.833, 0.167)

4Number

p < 0.001

5

≤≤ 3 >> 3

n = 30y = (1, 0)

6n = 15

y = (0.933, 0.067)

7

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Construction of a random forest

I draw ntree bootstrap samples from original sample

I fit a classification tree to each bootstrap sample

⇒ ntree trees

I creates diverse set of trees because

I trees are instable w.r.t. changes in learning data

⇒ ntree different looking trees (bagging)

I randomly preselect mtry splitting variables in each split

⇒ ntree more different looking trees (random forest)

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Random forests in R

I randomForest (pkg: randomForest)

I reference implementation based on CART trees

(Breiman, 2001; Liaw and Wiener, 2008)

– for variables of different types: biased in favor of

continuous variables and variables with many categories

(Strobl, Boulesteix, Zeileis, and Hothorn, 2007)

I cforest (pkg: party)

I based on unbiased conditional inference trees

(Hothorn, Hornik, and Zeileis, 2006)

+ for variables of different types: unbiased when

subsampling, instead of bootstrap sampling, is used

(Strobl, Boulesteix, Zeileis, and Hothorn, 2007)

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

(Small) random forest

Startp < 0.001

1

≤≤ 8 >> 8

n = 15y = (0.4, 0.6)

2Start

p < 0.001

3

≤≤ 14 >> 14

n = 34y = (0.882, 0.118)

4n = 32

y = (1, 0)

5

Startp < 0.001

1

≤≤ 12 >> 12

n = 38y = (0.711, 0.289)

2Number

p < 0.001

3

≤≤ 3 >> 3

n = 25y = (1, 0)

4n = 18

y = (0.889, 0.111)

5

Startp < 0.001

1

≤≤ 12 >> 12

Agep < 0.001

2

≤≤ 27 >> 27

n = 10y = (1, 0)

3Number

p < 0.001

4

≤≤ 4 >> 4

n = 14y = (0.357, 0.643)

5n = 9

y = (0.111, 0.889)

6

Startp < 0.001

7

≤≤ 13 >> 13

n = 11y = (0.818, 0.182)

8n = 37

y = (1, 0)

9

Startp < 0.001

1

≤≤ 8 >> 8

Startp < 0.001

2

≤≤ 1 >> 1

n = 9y = (0.778, 0.222)

3n = 13

y = (0.154, 0.846)

4

Startp < 0.001

5

≤≤ 12 >> 12

n = 12y = (0.833, 0.167)

6n = 47

y = (1, 0)

7

Startp < 0.001

1

≤≤ 8 >> 8

n = 13y = (0.308, 0.692)

2Age

p < 0.001

3

≤≤ 87 >> 87

n = 36y = (1, 0)

4Start

p < 0.001

5

≤≤ 13 >> 13

n = 16y = (0.75, 0.25)

6n = 16

y = (1, 0)

7

Numberp < 0.001

1

≤≤ 5 >> 5

Agep < 0.001

2

≤≤ 81 >> 81

n = 33y = (1, 0)

3Start

p < 0.001

4

≤≤ 12 >> 12

n = 13y = (0.385, 0.615)

5Start

p < 0.001

6

≤≤ 15 >> 15

n = 12y = (0.833, 0.167)

7n = 12

y = (1, 0)

8

n = 11y = (0.364, 0.636)

9

Startp < 0.001

1

≤≤ 12 >> 12

Agep < 0.001

2

≤≤ 81 >> 81

n = 20y = (0.85, 0.15)

3n = 16

y = (0.188, 0.812)

4

Startp < 0.001

5

≤≤ 13 >> 13

n = 11y = (0.818, 0.182)

6n = 34

y = (1, 0)

7

Startp < 0.001

1

≤≤ 12 >> 12

Agep < 0.001

2

≤≤ 71 >> 71

n = 15y = (0.667, 0.333)

3n = 17

y = (0.235, 0.765)

4

Startp < 0.001

5

≤≤ 14 >> 14

n = 17y = (0.882, 0.118)

6n = 32

y = (1, 0)

7

Startp < 0.001

1

≤≤ 12 >> 12

Agep < 0.001

2

≤≤ 68 >> 68

Numberp < 0.001

3

≤≤ 4 >> 4

n = 11y = (1, 0)

4n = 9

y = (0.556, 0.444)

5

n = 12y = (0.25, 0.75)

6

n = 49y = (1, 0)

7

Startp < 0.001

1

≤≤ 12 >> 12

Agep < 0.001

2

≤≤ 18 >> 18

n = 10y = (0.9, 0.1)

3Number

p < 0.001

4

≤≤ 4 >> 4

n = 12y = (0.417, 0.583)

5n = 10

y = (0.2, 0.8)

6

Numberp < 0.001

7

≤≤ 3 >> 3

n = 28y = (1, 0)

8n = 21

y = (0.952, 0.048)

9

Startp < 0.001

1

≤≤ 8 >> 8

Startp < 0.001

2

≤≤ 3 >> 3

n = 12y = (0.667, 0.333)

3n = 14

y = (0.143, 0.857)

4

Agep < 0.001

5

≤≤ 136 >> 136

n = 47y = (1, 0)

6n = 8

y = (0.75, 0.25)

7

Startp < 0.001

1

≤≤ 12 >> 12

n = 28y = (0.607, 0.393)

2Start

p < 0.001

3

≤≤ 14 >> 14

n = 21y = (0.905, 0.095)

4n = 32

y = (1, 0)

5

Startp < 0.001

1

≤≤ 1 >> 1

n = 8y = (0.375, 0.625)

2Number

p < 0.001

3

≤≤ 4 >> 4

Agep < 0.001

4

≤≤ 125 >> 125

n = 31y = (1, 0)

5n = 11

y = (0.818, 0.182)

6

n = 31y = (0.806, 0.194)

7

Startp < 0.001

1

≤≤ 14 >> 14

Agep < 0.001

2

≤≤ 71 >> 71

n = 15y = (0.933, 0.067)

3Start

p < 0.001

4

≤≤ 12 >> 12

n = 16y = (0.375, 0.625)

5n = 15

y = (0.733, 0.267)

6

n = 35y = (1, 0)

7

Numberp < 0.001

1

≤≤ 6 >> 6

Numberp < 0.001

2

≤≤ 3 >> 3

Startp < 0.001

3

≤≤ 13 >> 13

n = 10y = (0.8, 0.2)

4n = 24

y = (1, 0)

5

n = 37y = (0.865, 0.135)

6

n = 10y = (0.5, 0.5)

7

Startp < 0.001

1

≤≤ 8 >> 8

n = 18y = (0.5, 0.5)

2Start

p < 0.001

3

≤≤ 12 >> 12

n = 18y = (0.833, 0.167)

4Number

p < 0.001

5

≤≤ 3 >> 3

n = 30y = (1, 0)

6n = 15

y = (0.933, 0.067)

7

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Measuring variable importance

I Gini importance

mean Gini gain produced by Xj over all trees

I obj <- randomForest(..., importance=TRUE)

obj$importance column: MeanDecreaseGini

importance(obj, type=2)

for variables of different types: biased in favor of continuous

variables and variables with many categories

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Measuring variable importance

I permutation importance

mean decrease in classification accuracy after

permuting Xj over all trees


obj$importance column: MeanDecreaseAccuracy

importance(obj, type=1)

I obj <- cforest(...)

varimp(obj)

for variables of different types: unbiased only when

subsampling is used as in cforest(..., controls =

cforest unbiased())

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

The permutation importance

within each tree t

VI (t)(xj) =

∑i∈B

(t) I(yi = y

(t)i

)∣∣∣B(t)

∣∣∣ −

∑i∈B

(t) I(yi = y

(t)i ,πj

)∣∣∣B(t)

∣∣∣y

(t)i = f (t)(xi ) = predicted class before permuting

y(t)i ,πj

= f (t)(xi ,πj) = predicted class after permuting Xj

xi ,πj= (xi ,1, . . . , xi ,j−1, xπj (i),j , xi ,j+1, . . . , xi ,p

)Note: VI (t)(xj) = 0 by definition, if Xj is not in tree t

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References


over all trees:

1. raw importance

VI (xj) =

∑ntreet=1 VI (t)(xj)

ntree


importance(obj, type=1, scale=FALSE)

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References


over all trees:

2. scaled importance (z-score)

VI (xj)σ√

ntree

= zj


importance(obj, type=1, scale=TRUE) (default)

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Tests for variable importance

for variable selection purposes

I Breiman and Cutler (2008): simple significance test

based on normality of z-score

randomForest, scale=TRUE + α-quantile of N(0,1)

I Diaz-Uriarte and Alvarez de Andres (2006): backward

elimination (throw out least important variables until

out-of-bag prediction accuracy drops)

varSelRF (pkg: varSelRF), dep. on randomForest

I Diaz-Uriarte (2007) and Rodenburg et al. (2008): plots

and significance test (randomly permute response values

to mimic the overall null hypothesis that none of the

predictor variables is relevant = baseline)

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Tests for variable importance

problems of these approaches:

I (at least) Breiman and Cutler (2008): strange statistical

properties (Strobl and Zeileis, 2008)

I all: preference of correlated predictor variables (see also

Nicodemus and Shugart, 2007; Archer and Kimes, 2008)

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Breiman and Cutler’s test

under the null hypothesis of zero importance:

zjas.∼ N(0, 1)

if zj exceeds the α-quantile of N(0,1) ⇒ reject the

null hypothesis of zero importance for variable Xj

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Raw importance

relevance0.0 0.1 0.2 0.3 0.4

ntree = 100mean importance

0.0 0.1 0.2 0.3 0.4


0.0 0.1 0.2 0.3 0.4


sample size100200500

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

z-score and power

relevance

0.0 0.1 0.2 0.3 0.4

ntree = 100power

0.0 0.1 0.2 0.3 0.4

ntree = 200power

0.0 0.1 0.2 0.3 0.4

ntree = 500power

0.0 0.1 0.2 0.3 0.4

ntree = 100z−score

0.0 0.1 0.2 0.3 0.4


0.0 0.1 0.2 0.3 0.4


sample size100200500

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Findings

z-score and power

I increase in ntree

I decrease in sample size

⇒ rather use raw, unscaled permutation importance!

importance(obj, type=1, scale=FALSE)

varimp(obj)

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

What null hypothesis were we testing

in the first place?

obs Y Xj Z

1 y1 xπj (1),j z1

......

......

i yi xπj (i),j zi

......

......

n yn xπj (n),j zn

H0 : Xj ⊥ Y ,Z or Xj ⊥ Y ∧ Xj ⊥ Z

P(Y ,Xj ,Z )H0= P(Y ,Z ) · P(Xj)

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

What null hypothesis were we testing

in the first place?

the current null hypothesis reflects independence of Xj from

both Y and the remaining predictor variables Z

⇒ a high variable importance can result from violation of

either one!

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Technically

I use any partition of the feature space for conditioning

I here: use binary partition already learned by tree

(use cutpoints as bisectors of feature space)

I condition on correlated variables or select some

Strobl et al. (2008)

available in cforest from version 0.9-994: varimp(obj,

conditional = TRUE)

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Simulation study

I dgp: yi = β1 · xi ,1 + · · ·+β12 · xi ,12 + εi , εii .i .d .∼ N(0, 0.5)

I X1, . . . ,X12 ∼ N(0,Σ)

Σ =

1 0.9 0.9 0.9 0 · · · 0

0.9 1 0.9 0.9 0 · · · 0

0.9 0.9 1 0.9 0 · · · 0

0.9 0.9 0.9 1 0 · · · 0

0 0 0 0 1 · · · 0...

......

......

. . . 0

0 0 0 0 0 0 1

Xj X1 X2 X3 X4 X5 X6 X7 X8 · · · X12

βj 5 5 2 0 -5 -5 -2 0 · · · 0

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Results

mtr

y =

1

● ●●

●

● ●

●● ● ● ● ●0

515

25

mtr

y =

3

● ●

●

●

● ●

● ● ● ● ● ●010

3050

variable

mtr

y =

8

● ●

●●

● ●

● ● ● ● ● ●

1 2 3 4 5 6 7 8 9 10 11 12

020

4060

80

variable

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Peptide-binding data

00.

005

unco

nditi

onal

cond

ition

al

00.

005

cond

ition

al

h2y8 flex8 pol3*

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Summary

if your predictor variables are of different types:

use cforest (pkg: party) with default option controls =

cforest unbiased()

with permutation importance varimp(obj)

otherwise: feel free to use cforest (pkg: party)

with permutation importance varimp(obj)

or randomForest (pkg: randomForest)

with permutation importance importance(obj, type=1)

or Gini importance importance(obj, type=2)

but don’t fall for the z-score! (i.e. set scale=FALSE)

if your predictor variables are highly correlated: use the

conditional importance in cforest (pkg: party)

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Archer, K. J. and R. V. Kimes (2008). Empirical characterization

of random forest variable importance measures. Computational

Statistics & Data Analysis 52(4), 2249–2260.

Breiman, L. (2001). Random forests. Machine Learning 45(1),

5–32.

Breiman, L. and A. Cutler (2008). Random forests – Classification

manual. Website accessed in 1/2008;

http://www.math.usu.edu/∼adele/forests.

Breiman, L., A. Cutler, A. Liaw, and M. Wiener (2006). Breiman

and Cutler’s Random Forests for Classification and Regression.

R package version 4.5-16.

Diaz-Uriarte, R. (2007). GeneSrF and varselrf: A web-based

tool and R package for gene selection and classification using

random forest. BMC Bioinformatics 8:328.

Introduction

Construction

R functions

Variable

importance

Tests for variable

importance

Conditional

importance

Summary

References

Hothorn, T., K. Hornik, and A. Zeileis (2006). Unbiased recursive

partitioning: A conditional inference framework. Journal of

Computational and Graphical Statistics 15(3), 651–674.

Strobl, C., A.-L. Boulesteix, A. Zeileis, and T. Hothorn (2007).

Bias in random forest variable importance measures:

Illustrations, sources and a solution. BMC Bioinformatics 8:25.

Strobl, C. and A. Zeileis (2008). Danger: High power! – exploring

the statistical properties of a test for random forest variable

importance. In Proceedings of the 18th International

Conference on Computational Statistics, Porto, Portugal.

Strobl, C., A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis

(2008). Conditional variable importance for random forests.

BMC Bioinformatics 9:307.

why and how to use random forestzeileis.pdfintroduction construction r functions variable importance...

Documents