bl ews nb rms - stanford universitystatweb.stanford.edu/~tibs/pam/rdist/doc.pdf · cv tr tr tr tr...

13

Upload: others

Post on 31-Jul-2020

39 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BL EWS NB RMS - Stanford Universitystatweb.stanford.edu/~tibs/PAM/Rdist/doc.pdf · cv tr tr tr tr tr tr tr tr tr tr te te te te te te te te te te te te te te te te te te te. SL&DM

SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 28

'

&

$

%

Classi�cation of microarray samples

Example: small round blue cell tumors; Khan et

al, Nature Medicine, 2001

� Tumors classi�ed as BL (Burkitt lymphoma),

EWS (Ewing), NB (neuroblastoma) and RMS

(rhabdomyosarcoma).

� There are 63 training samples and 25 test

samples, although �ve of the latter were not

SRBCTs. 2308 genes

� Khan et al report zero training and test

errors, using a complex neural network model.

Decided that 96 genes were \important".

� Upon close examination, network is linear.

It's essentially extracting linear principal

components, and classifying in their subspace.

� But even principal components is

unnecessarily complicated for this problem!

Page 2: BL EWS NB RMS - Stanford Universitystatweb.stanford.edu/~tibs/PAM/Rdist/doc.pdf · cv tr tr tr tr tr tr tr tr tr tr te te te te te te te te te te te te te te te te te te te. SL&DM

SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 29

'

&

$

%

Khan data

BL EWS NB RMS

Page 3: BL EWS NB RMS - Stanford Universitystatweb.stanford.edu/~tibs/PAM/Rdist/doc.pdf · cv tr tr tr tr tr tr tr tr tr tr te te te te te te te te te te te te te te te te te te te. SL&DM

SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 31

'

&

$

%

Class centroids

BL

Average Expression

Gen

e

-0.5 0.5

050

010

0015

0020

00

EWS

Average Expression

Gen

e

-0.5 0.5

050

010

0015

0020

00

NB

Average Expression

Gen

e

-0.5 0.5

050

010

0015

0020

00

RMS

Average Expression

Gen

e

-0.5 0.5

050

010

0015

0020

00

Average expression

Gen

e

-1.0 0.0 1.0

050

010

0015

0020

00

Test sample

Page 4: BL EWS NB RMS - Stanford Universitystatweb.stanford.edu/~tibs/PAM/Rdist/doc.pdf · cv tr tr tr tr tr tr tr tr tr tr te te te te te te te te te te te te te te te te te te te. SL&DM

SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 32

'

&

$

%

Nearest Shrunken Centroids

Idea: shrink each class centroid towards

the overall centroid. First normalize by

the within-class standard deviation for

each gene.

Details

� Let xij be the expression for genes i = 1; 2; : : : p

and samples j = 1; 2; : : : n.

� We have classes 1; 2; : : : K, and let Ck be indices

of the nk samples in class k.

� The ith component of the centroid for class k is

�xik =P

j2Ckxij=nk, the mean expression value

in class k for gene i; the ith component of the

overall centroid is �xi =Pn

j=1xij=n.

Page 5: BL EWS NB RMS - Stanford Universitystatweb.stanford.edu/~tibs/PAM/Rdist/doc.pdf · cv tr tr tr tr tr tr tr tr tr tr te te te te te te te te te te te te te te te te te te te. SL&DM

SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 33

'

&

$

%

� Let

dik = (�xik � �xi)=si

where si is the pooled within-class standard

deviation for gene i:

s2

i =1

n�K

X

k

X

i2Ck

(xij � �xik)2

:

� Shrink each dik towards zero, giving d0

ik and new

shrunken centroids or prototypes

�x0

ik = �xi + sid0

ik

� The shrinkage is by soft-thresholding:

(0,0)

d0

ik = sign(dik)(jdikj ��)+

� Choose � by cross-validation.

Page 6: BL EWS NB RMS - Stanford Universitystatweb.stanford.edu/~tibs/PAM/Rdist/doc.pdf · cv tr tr tr tr tr tr tr tr tr tr te te te te te te te te te te te te te te te te te te te. SL&DM

SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 34

'

&

$

%

K-Fold Cross-Validation

Primary method for estimating a tuning parameter �.

Divide the data into K roughly equal parts.

1 3 54

Train TrainTest

2

Train Train

� for each k = 1; 2; : : : K, �t the model with

parameter � to the other K � 1 parts, and

compute its error in predicting the kth part.

Average this error over the K parts to give the

estimate CV (�).

� do this for many values of �. Draw the curve

CV (�) and choose the value of � that makes

CV (�) smallest.

Typically we use K = 5 or 10.

Page 7: BL EWS NB RMS - Stanford Universitystatweb.stanford.edu/~tibs/PAM/Rdist/doc.pdf · cv tr tr tr tr tr tr tr tr tr tr te te te te te te te te te te te te te te te te te te te. SL&DM

SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 35

'

&

$

%

Results

Amount of Shrinkage Delta

Err

or

0 2 4 6

2308

2188

1668

1020

598

339

206

133

81 52 34 22 15 10 8 5 1

Number of genes

0.0

0.2

0.4

0.6

0.8

cv cv cvcv cv cv cv

cv

cv

cv

tr tr tr tr tr tr

tr

tr

tr

tr

tete

te tete te te te te te te

te te

te

te

tete

te te

Page 8: BL EWS NB RMS - Stanford Universitystatweb.stanford.edu/~tibs/PAM/Rdist/doc.pdf · cv tr tr tr tr tr tr tr tr tr tr te te te te te te te te te te te te te te te te te te te. SL&DM

SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 36

'

&

$

%

Advantages

� Simple, includes nearest centroid classi�er as

a special case.

� Thresholding denoises large e�ects, and sets

small ones to zero, thereby selecting genes.

� with more than two classes, method can

select di�erent genes, and di�erent numbers

of genes for each class.

Page 9: BL EWS NB RMS - Stanford Universitystatweb.stanford.edu/~tibs/PAM/Rdist/doc.pdf · cv tr tr tr tr tr tr tr tr tr tr te te te te te te te te te te te te te te te te te te te. SL&DM

SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 37

'

&

$

%

The genes that matter

BL EWS NB RMS

295985

866702

814260

770394

377461

810057

365826

41591

629896

308231

325182

812105

44563

244618

796258

298062

784224

296448

207274

563673

504791

204545

21652

308163

212542

183337

241412

Page 10: BL EWS NB RMS - Stanford Universitystatweb.stanford.edu/~tibs/PAM/Rdist/doc.pdf · cv tr tr tr tr tr tr tr tr tr tr te te te te te te te te te te te te te te te te te te te. SL&DM

SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 38

'

&

$

%

Estimated Class Probabilities

Sample

Pro

babi

lity

0 10 20 30 40 50 60

0.0

0.2

0.4

0.6

0.8

1.0

Training Data

• • • • • • • •

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • •

•• •

•• • • •

• ••••• • •

•••

• • •

• ••

•• •

••••• • • •

• • • • •• • •

•• • • • •

• • •

• • • • • • • ••• •

• • • • • • •• • • • • • • • • • • • •

•• • •

• •

•• • •

••

•• • • • • • • • •

• • • • • • • • • •• • • • • • • •

•• •

•• • • • • • •

••• • •

•• •

• • • •• •

•• •

• •••••

•• • • • • • • •

•• •

• • •

•• •

BL EWS NB RMS

Sample

Pro

babi

lity

5 10 15 20 25

0.0

0.2

0.4

0.6

0.8

1.0

Test Data

• • •

• • •

• • • • • • • • • • •

• • • • •

• • •

••

• • • ••

••

• • ••

• • •

• ••

••

••

• • • •

• • • •• • •

• • ••

••

••

• •

• • • ••

•• •

BL EWS NB RMS

OO O O

O

Page 11: BL EWS NB RMS - Stanford Universitystatweb.stanford.edu/~tibs/PAM/Rdist/doc.pdf · cv tr tr tr tr tr tr tr tr tr tr te te te te te te te te te te te te te te te te te te te. SL&DM

SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 39

'

&

$

%

Class probabilities

� For a test sample x� = (x�1; x�2; : : : x

�p). We

de�ne the discriminant score for class k

Æk(x�) =

pX

i=1

(x�i� �x0

ik)2

s2i

� 2 log �k

� The classi�cation rule is then

C(x�) = ` if Æ`(x�) = mink Æk(x

�)

� estimates of the class probabilities, by analogy

to Gaussian linear discriminant analysis, are

p̂k(x�) =

e� 1

2Æk(x

�)

PK

`=1 e� 1

2Æ`(x�)

� Still very simple. In statistical parlance, this

is a restricted version of a naive Bayes

classi�er (also called idiot's Bayes!)

Page 12: BL EWS NB RMS - Stanford Universitystatweb.stanford.edu/~tibs/PAM/Rdist/doc.pdf · cv tr tr tr tr tr tr tr tr tr tr te te te te te te te te te te te te te te te te te te te. SL&DM

SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 40

'

&

$

%

Adaptive threshold scaling

� idea: de�ne class-dependent scaling factors �k

for each class:

dik =�xik � �xi

mk�k � si

: (1)

� Use smaller factors for hard-to-classify classes

=> same test error with fewer total number

of genes

� Adaptive procedure: start with all �k = 1,

and then reduce �k by 10% for the class k

with largest area under training error curve.

� repeat 20 times and choose solution with

smallest area under curve for all classes

� can dramatically reduce total number of

genes used, without increasing error rate

Page 13: BL EWS NB RMS - Stanford Universitystatweb.stanford.edu/~tibs/PAM/Rdist/doc.pdf · cv tr tr tr tr tr tr tr tr tr tr te te te te te te te te te te te te te te te te te te te. SL&DM

SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 41

'

&

$

%

Lymphoma data

Scaling factors changed from (1; 1; 1) to

(1:9; 1; 1:5)

0 2 4 6

4026

3795

3272

2593

2042

1561

1153

856

621

439

309

207

135

89 62 40 30 18 9 5 1

0.0

0.2

0.4

0.6

0.8

tetetetetetete

tetetetetetetetetetetetetetetetetetetetetetetetetetetetete

tete

tetetete

tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr trtr tr tr tr tr

0 1 2 3 4 5

4026

3757

3146

2414

1774

1236

867

557

354

212

128

78 48 21 13 3

0.0

0.2

0.4

0.6

0.8

te te te te te te te te te te te te te te te te te te te te te te te te te

te te te te

te te te

tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr

tr tr tr tr

tr tr tr

Amount of Shrinkage �

Amount of Shrinkage �

Error

Error

Size

Size