=20 @let@token presentation of the paper: 'the positive false … · presentation of the...

Presentation of The Paper: ”The Positive False DiscoveryRate: A Bayesian Interpretation and the q-Value”, J.D.Storey, The Annals of Statistics, Vol. 31 No.6 (Dec. 2003),pp 2013-2035

Aliaksandr Hubin

University of Oslo

[email protected]

29.08.2014

Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 1 / 25

Overview

1 Introduction

2 Multiple Hypothesis Testing

3 Error Measurements and Control

4 Bayesian interpretation of pFDR

5 The q-value

6 Dependence of test statistics and asymptotic properties

7 A connection to classification theory

8 An application to DNA micro arrays in a Bayesian framework

9 Conclusions

10 Discussion


Introduction

Single hypothesis aim to minimize Error-II having Error-I controlled bysome positive α;

In multiple hypothesis testing controlling each test individually leads tothe increase of number of both False Positives and False Negatives;

Measures like FWER(P{V ≥ 1}) and FDR(E{VR }) have been suggested

to measure the number of False Positives;

A number of methods to control FWER and/or FDR have been suggested:Bonferroni Method, Benjamini-Hochberg and etc.;

This is my very first time to use LaTex and I have tried to play aroundwith different features: please do not judge my formatting too strictly,.


Possible outcomes from m hypothesis tests

Accept null Reject null Total

Null true U V m0

Alternative true T S m1

Total W R m

Table: 1


List of measures of Error I level and their drawbacks

Controlled Measures

1 P{V ≥ 0}2 E{V

R }3 E{V

R ∣R > 0} ∗ Pr{R > 0}4 E{V

R ∣R > 0}5

E{V }E{R}

Drawbacks

1 Significant decrease of power ofm tests

2 Not defined when R=0

3 Little interest in cases when allcases are significant

4 Equals to 1 when m = m0,whereas α ∈ (0,1)

5 Equals to 1 when m = m0,whereas α ∈ (0,1)

Authors however choose E{VR ∣R > 0} to be controlled; they call it pFDR

(positive false discovery rate) and argue, that such a measure should beonly available when we have at least one rejection that occurs, they alsoclaim that it makes sense that the measure is equal to one, when m = m0,however they do not give neither practical nor theoretical reason for that.


One should be careful when controlling pFDR by means ofthe Benjamini and Hochberg procedure

Procedure

k̂ = argmax1≤k≤m

{k ∶ p(k) ≤ α km},p(i) ≤ p(i+1), i ∈ [1,m − 1]⋂Z

Reject all H0i , i ≤ k̂

Note that Benjamini-Hochberg procedure controls FDR (3) at α∗ = αPr{V≥0}

!!!


Bayesian interpretation of pFDR


p-value and q-value definitions

p-value

p − value(t) = infΓα∶t∈Γα

{Pr{T ∈ Γα∣H0}} = Pr{∣T ∣ ≥ t ∣H0}

p-value is a type I error when rejecting any hypothesis based on statisticsequal or more extreme to t in other words it is the minimal type I error overall significance regions that might take place when rejecting a statistic withvalue t

q-value

q − value(t) = infΓα∶t∈Γα

{pFDR{Γα}} = infΓα∶t∈Γα

{Pr{H0∣T ∈ Γα}} == pFDR{∣T ∣ ≥ t} = Pr{H0∣∣T ∣ ≥ t}

q-value is a pFDR error when rejecting any hypothesis based on statisticsequal or more extreme to t in other words it is the minimal pFDR over allsignificance regions that might take place when rejecting a statistic withvalue t


q-value maximization in terms of Type I error and power

Note that

argminΓα∶t∈Γα

{pFDR{Γα}} = argminΓα∶t∈Γα

{Pr{H0∣T ∈ Γα}} = argminΓα∶t∈Γα

Pr{T∈Γα∣H0}

Pr{T∈Γα∣H1}=

argminΓα∶t∈Γα

G0(α)G1(α)

= G1(α∗)

G′

1(α∗)

Where


Relations between p-value and q-value for concave G1(α)

Figure 1


pFDR transformation of p-value to Γα

This theorem says that through pFDR space of p-value can be transformedinto the space of significant regions if and only if the Power function isincreasing slower that Type I error, which is its argument, or in other wordsif and only if the Power function is concave.


Generalization of Theorem 1

As one can see theorem one is not valid for both of such settings


Asymptotic properties of FDR-controlling measures


Asymptotic properties of FDR-controlling measures

Where the following equations define asymptotic frequency based analoguesof Type I error and Power:

Thus, Theorem 4 says that if G0,G1 and π0 can be calculated than forsufficiently large m these provides good approximations for all three FDR-controlling measures.


Practical example of such convergence


Relation to classification theory

FNR

FNR = E{ TW ∣W ≥ 0}Pr{W ≥ 0}

AND

pFNR

FNR = E{ TW ∣W ≥ 0}


Bayes Miss-classification error

BE(Γ)BE(Γ) = (1 − λ)Pr{Ti ∈ Γ,Hi = 0} + λPr{Ti ∉ Γ,Hi = 1}

Classify Hi as 1 Classify Hi as 0

Null true 0 1 − λAlternative true λ 0

Table: 2. Outcomes of classification with the corresponding penalties


Bayesian interpretation of pFNR


Trade-off between different mixed error measures

Where set Bλ, λ ∈ [0; 1] defines the Bayes rule for the cost matrix given byTable 3:


Practical application to DNA micro arrays

Performed steps and achieved results:

1 Ti ∣Hi ∼ (1 −Hi)F0 +HiF1;

2 Pr{Hi = 0∣Ti = ti} = π0f0(ti)π0f0(ti)+π1f1(ti)

is estimated by P̂r{Hi = 0∣Ti = ti};

3 B̂λ = {t ∶ P̂r{H = 0∣T = t}} ≤ λ;

4 λ is chosen to be 0.10;

5 pFDR{B̂0.10} = Pr{H = 0∣T ∈ B̂0.10};

6 q̂ − value(ti) = P̂r{Hi = 0∣Ti ∈ B̂P̂r{Hi=0∣Ti=ti}}


Conclusions


Discussion of stupid (???) stuff

multiple type I error measure

AY (Γα, θH0) = Pr{V > Y } = 1 − Fbin(Pr(H1∣H0))(Y ),Y = ⌊r1N⌋

multiple type II error measure

BZ(Γα, θH1) = Pr{T > Z} = 1 − Fbin(Pr(H0∣H1))(Z),Z = ⌊r2N⌋

Bayesian rule

{Γ∗α, θ∗

H0, θ∗H1

} = argminΓα,θH0

,θ∗H1

{λ1Bz(Γα, θH1) + λ2Ay(Γα, θH0)}


Discussion of stupid (???) stuff

multiple p-value

P(t1, ...tn)Y = infΓα∶{τ1,...,τY }∈Γα

{Pr{{τ1, ..., τY } ∈

Γα,{t1, ..., tn} ∖ {τ1, ..., τY } ∉ Γα∣H0}},{τ1, ..., τY } ⊆ {t1, ..., tn}


References

J.D. Storey (2003)

The Positive False Discovery Rate: A Bayesian Interpretation and the q-Value

The Annals of Statistics 31(6), 2013 – 2035.


The End.

Thank You for the attention!


=20 @let@token presentation of the paper: 'the positive false … · presentation of the...

Documents