sequential multiple decision procedures (smdp) for genome scans q.y. zhang and m.a. province...

30
Sequential Multiple Decision Sequential Multiple Decision Procedures (SMDP) Procedures (SMDP) for Genome Scans for Genome Scans Q.Y. Zhang and M.A. Province Q.Y. Zhang and M.A. Province Division of Statistical Genomics Division of Statistical Genomics Washington University School of Washington University School of Medicine Medicine Statistical Genetics Forum, April, 2006 Statistical Genetics Forum, April, 2006

Upload: jemima-cole

Post on 04-Jan-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

Sequential Multiple Decision Procedures Sequential Multiple Decision Procedures (SMDP)(SMDP)

for Genome Scansfor Genome Scans

Q.Y. Zhang and M.A. Province Q.Y. Zhang and M.A. Province

Division of Statistical GenomicsDivision of Statistical GenomicsWashington University School of MedicineWashington University School of Medicine

Statistical Genetics Forum, April, 2006Statistical Genetics Forum, April, 2006

Page 2: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

ReferencesReferences

R.E. Bechhofer, J. Kiefer., M. Sobel. 1968. Sequential identification and ranking procedures. The University of Chicago Press, Chicago.

M.A. Province. 2000. A single, sequential, genome-wide test to identify simultaneously all promising areas in a linkage scan. Genetic Epidemiology,19:301-332 .

Q.Y. Zhang, M.A. Province . 2005. Simplified sequential multiple decision procedures for genome scans . 2005 Proceedings of American Statistical Association. Biometrics section:463~468

Page 3: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

SMDP SMDP

SequentialSequential Multiple DecisionMultiple Decision Procedures Procedures

Sequential testSequential test

Multiple hypothesis testMultiple hypothesis test

Page 4: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

Idea 1: SequentialIdea 1: Sequential

nn00Start from a small sample size

Increase sample size, sequential test at each stage (SPRT)

Stop when stopping rule is satisfied

nn00+1+1

nn00+2+2

nn00+i+i

Experiment in next stage Extra data for validation

Page 5: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

Idea 2: Multiple DecisionIdea 2: Multiple Decision

SNP1SNP1

SNP2SNP2

SNP3SNP3

SNP4SNP4

SNP5SNP5

SNP6SNP6

……

SNPnSNPn

Simultaneous testSimultaneous testMultiple hypothesis testMultiple hypothesis test Independent testIndependent test

Binary hypothesis testBinary hypothesis test test 1

test 2

test 3

test 4

test 5

test 6

test n

SNP1SNP1

SNP2SNP2

SNP3SNP3

SNP4SNP4

SNP5SNP5

SNP6SNP6

……

SNPnSNPntest-wise error and experiment-wise error

p value correction

Signal Signal group group

Noise Noise group group

Page 6: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

Binary Hypothesis TestBinary Hypothesis Test

SNP1SNP1

SNP2SNP2

SNP3SNP3

SNP4SNP4

SNP5SNP5

SNP6SNP6

……

SNPnSNPn

test 1 H0: Eff.(SNP1)=0 vs. H1: Eff.(SNP1)≠0

test 2 H0: Eff.(SNP2)=0 vs. H1: Eff.(SNP2)≠0

test 3 ……

test 4 ……

test 5 ……

test 6 ……

test n H0: Eff.(SNPn)=0 vs. H1: Eff.(SNPn)≠0

Page 7: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

Multiple Hypothesis TestMultiple Hypothesis Test

SNP1SNP1

SNP2SNP2

SNP3SNP3

SNP4SNP4

SNP5SNP5

SNP6SNP6

……

SNPnSNPn

H1: SNP1,2,3 are truly different from the others

H2: SNP1,2,4 are truly different from the others

H3 ……

H4 ……

H5: SNP4,5,6 are truly different from the others

H6 ……

……

Hu: SNPn,n-1,n-2 are truly different from the others

H: any t SNPs are truly different from the others (n-t)

u= number of all possible combination of t out of n

Page 8: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

SMDPSMDP

Sequential test Multiple hypothesis test

Sequential Multiple Decision Procedure

Page 9: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

Koopman-Darmois(K-D) PopulationsKoopman-Darmois(K-D) Populations (Bechhofer et al., 1968)(Bechhofer et al., 1968)

The freq/density function of a K-D population can be written in the form:

f(x)=exp{P(x)Q(θ)+R(x)+S(θ)}

A. The normal density function with unknown mean and known variance;

B. The normal density function with unknown variance and known mean;

C. The exponential density function with unknown scale parameter and known location parameter;

D. The Bernoulli distribution with unknown probability of “success” on a single trial;

E. The Poisson distribution with unknown mean;

……

The distance of two K-D populations is defined as :

)()(, jiji QQ ji

jiB

2

1

2

1,:

Page 10: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

SMDP SMDP (Bechhofer et al., 1968)(Bechhofer et al., 1968)

Selecting the Selecting the t t best of best of MM K-D populations K-D populations

Sequential Sampling

1 2 … h h+1 …

Pop. 1

Pop. 2

:

Pop. t-1

Pop. t

Pop. t+1

Pop. t+2

:

Pop. M

D

Y1,h

Y2,h

:

:

Yi,h

:

::

YM,h

U

j

thj

thU

hU

YD

YDW

1

)exp(

)exp(

)(],[

*

)(],[

*

],[

)!(!

!

tMt

MU

U possible combinations

of t out of M

t

khi

thu k

YY1

,)(

,

For each combination u

)(],[

)(],[

)(],[

)(],[ ... t

hUt

hUt

hth YYYY 121

*],[ PW hU Stopping rule

Prob. of correct selection (PCS) > P*, whenever D>D*

Page 11: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

SMDP: SMDP: P*, t, D*P*, t, D*

P* P* arbitrary, 0.95arbitrary, 0.95

t fixed or variedt fixed or varied

D* indifference zone D* indifference zone

Pop. 1

Pop. 2

:

Pop. t-1

Pop. t

Pop. t+1 Pop. t+2

:

:

:

Pop. M

D

*)exp(

)exp(

)(],[

*

)(],[

*

],[ PYD

YDW

U

j

thj

thU

hU

1

SMDP stopping rule

Prob. of correct selection (PCS) > P*whenever D>D*

Correct selection Populations with Q(θ)> Q(θt)+D* are selected

D*

Q(θt)+D

Q(θt)+D*

Q(θt)

Page 12: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

SMDP: SMDP: Computational ProblemComputational Problem

)t(h],U[

)t(h],1U[

)t(h],2[

)t(h],1[

*U

1j

)t(h],j[

*

)t(h],U[

*

h],U[

YY...YY

P)YDexp(

)YDexp(W

1

2

3

:

h

h+1

:

N

Sequential stage

Y1,h

Y2,h

:

Yt,h

Yt+1,h

Yt+2,h

:

YM,h

U sums of U possible combinations of t out of MEach sum contains t members of Yi,h

)!tM(!t

!MU

Computer time

?

Page 13: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

h],U[]1U[

h],U[]2U[

h],U[]2[

h],U[]1[

h],U[

)t(h],U[

)t(h],1U[

)t(h],1S[

)t(h],S[

)t(h],2[

)t(h],1[

*U

Sj

)t(h],j[

*)t(h],S[

*

)t(h],U[

*]SU[

h],U[

WWW...WW

YY...YY...YY

P)YDexp()YDexp()1S(

)YDexp(W

Simplified Stopping RuleSimplified Stopping Rule (Bechhofer et al., 1968)(Bechhofer et al., 1968)

U-S+1= Top Combination Number (TCN)

TCN=2 (i.e. S=U-1,U-S=1)=> the simplest stopping rule

}P1

P)1U(ln{

D

1YY

*

*

*h],tM[h],1tM[

When TCN=U (i.e. S=1, U-S=U-1)=> the original stopping rule

How to choose TCN? Balance between computational accuracy and computational time

Page 14: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School
Page 15: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

SMDP Combined With Regression ModelSMDP Combined With Regression Model(M.A. Province, 2000, page 320-321)(M.A. Province, 2000, page 320-321)

Z1 , X1

Z2 , X2

Z3 , X3

: :

Zh , Xh

Zh+1 , Xh+1

: :

ZN , XN

Data pairs for a marker

Sequential sum of squares of regression residualsYi,h denotes Y for marker i at stage h

1h

1j

2j1h

21h1h1h

h

1j

21hj

h

1j

2)h(j

h

1j

2)h(j

1h

1h)h()h(

1h1h

VY

),0(N~VrV

)XX()XX(h

)XX(h

)Xˆˆ(Zr

XZ

Page 16: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

Combine SMDP With Regression ModelCombine SMDP With Regression Model(M.A. Province, 2000, page 319)(M.A. Province, 2000, page 319)

),(~

)ˆˆ( )()(

2111

111

0

NVrV

XZr

XZ

hhh

hhh

hh

Case B : the normal density function with unknown variance and known mean;

h

jjihi VY

1

2,,

Page 17: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

Simplified Stopping Rule Simplified Stopping Rule M.A. Province, 2000 M.A. Province, 2000

page 321-322 page 321-322

Page 18: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

A Real Data Example (A Real Data Example (M.A. Province, 2000, page 310)M.A. Province, 2000, page 310)

Page 19: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

A Real Data Example (A Real Data Example (M.A. Province, 2000, page 308)M.A. Province, 2000, page 308)

Page 20: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

Simulation Results (1) Simulation Results (1) M.A. Province, 2000, page 312M.A. Province, 2000, page 312

Page 21: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

Simulation Results (2) Simulation Results (2) M.A. Province, 2000, page 313M.A. Province, 2000, page 313

Page 22: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School
Page 23: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

h],U[]1U[

h],U[]2U[

h],U[]2[

h],U[]1[

h],U[

)t(h],U[

)t(h],1U[

)t(h],1S[

)t(h],S[

)t(h],2[

)t(h],1[

*U

Sj

)t(h],j[

*)t(h],S[

*

)t(h],U[

*]SU[

h],U[

WWW...WW

YY...YY...YY

P)YDexp()YDexp()1S(

)YDexp(W

Simplified SMDPSimplified SMDP (Bechhofer et al., 1968)(Bechhofer et al., 1968)

U-S+1= Top Combination Number (TCN)

How to choose TCN?

Balance between computational accuracy and computational time

Page 24: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

DataData

Sample Sample sizesize

GenotypeGenotype PhenotypePhenotype

8585

Cell linesCell lines

5841 SNPs5841 SNPs

(category: 0,1,2)(category: 0,1,2)

ViabFu7ViabFu7

(continuous)(continuous)

Page 25: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

Relation of Relation of WW and and t t (h=50, D*=10)(h=50, D*=10)

Effective Top Combination Number

ETCN

Zhang & Province,2005,page 465Zhang & Province,2005,page 465

Page 26: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

ETCN CurveETCN Curve

Zhang & Province,2005,page 466Zhang & Province,2005,page 466

Page 27: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

t t =?=?

Zhang & Province,2005,page 466Zhang & Province,2005,page 466

Page 28: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

Zhang & Province,2005,page 467Zhang & Province,2005,page 467

P*=0.95P*=0.95D*=10D*=10TCN=10000TCN=10000

72 SNPs72 SNPsP<0.01P<0.01

Page 29: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

SMDP SummarySMDP Summary

Advantages:Advantages:

Test, identify all signals simultaneously, no multiple comparisons Test, identify all signals simultaneously, no multiple comparisons

Use “Minimal” N to find significant signals, efficient Use “Minimal” N to find significant signals, efficient

Tight control statistical errors (Type I, II), powerfulTight control statistical errors (Type I, II), powerful

Save rest of N for validation, reliableSave rest of N for validation, reliable

Further studies:Further studies:

Computer time Computer time

Extension to more methods/modelsExtension to more methods/models

Extension to non-K-D distributionsExtension to non-K-D distributions

Page 30: Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School

Thanks !Thanks !