=20 @let@token presentation of the paper: 'the positive false … · presentation of the...
TRANSCRIPT
Presentation of The Paper: ”The Positive False DiscoveryRate: A Bayesian Interpretation and the q-Value”, J.D.Storey, The Annals of Statistics, Vol. 31 No.6 (Dec. 2003),pp 2013-2035
Aliaksandr Hubin
University of Oslo
29.08.2014
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 1 / 25
Overview
1 Introduction
2 Multiple Hypothesis Testing
3 Error Measurements and Control
4 Bayesian interpretation of pFDR
5 The q-value
6 Dependence of test statistics and asymptotic properties
7 A connection to classification theory
8 An application to DNA micro arrays in a Bayesian framework
9 Conclusions
10 Discussion
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 2 / 25
Introduction
Single hypothesis aim to minimize Error-II having Error-I controlled bysome positive α;
In multiple hypothesis testing controlling each test individually leads tothe increase of number of both False Positives and False Negatives;
Measures like FWER(P{V ≥ 1}) and FDR(E{VR }) have been suggested
to measure the number of False Positives;
A number of methods to control FWER and/or FDR have been suggested:Bonferroni Method, Benjamini-Hochberg and etc.;
This is my very first time to use LaTex and I have tried to play aroundwith different features: please do not judge my formatting too strictly,.
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 3 / 25
Possible outcomes from m hypothesis tests
Accept null Reject null Total
Null true U V m0
Alternative true T S m1
Total W R m
Table: 1
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 4 / 25
List of measures of Error I level and their drawbacks
Controlled Measures
1 P{V ≥ 0}2 E{V
R }3 E{V
R ∣R > 0} ∗ Pr{R > 0}4 E{V
R ∣R > 0}5
E{V }E{R}
Drawbacks
1 Significant decrease of power ofm tests
2 Not defined when R=0
3 Little interest in cases when allcases are significant
4 Equals to 1 when m = m0,whereas α ∈ (0,1)
5 Equals to 1 when m = m0,whereas α ∈ (0,1)
Authors however choose E{VR ∣R > 0} to be controlled; they call it pFDR
(positive false discovery rate) and argue, that such a measure should beonly available when we have at least one rejection that occurs, they alsoclaim that it makes sense that the measure is equal to one, when m = m0,however they do not give neither practical nor theoretical reason for that.
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 5 / 25
One should be careful when controlling pFDR by means ofthe Benjamini and Hochberg procedure
Procedure
k̂ = argmax1≤k≤m
{k ∶ p(k) ≤ α km},p(i) ≤ p(i+1), i ∈ [1,m − 1]⋂Z
Reject all H0i , i ≤ k̂
Note that Benjamini-Hochberg procedure controls FDR (3) at α∗ = αPr{V≥0}
!!!
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 6 / 25
Bayesian interpretation of pFDR
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 7 / 25
p-value and q-value definitions
p-value
p − value(t) = infΓα∶t∈Γα
{Pr{T ∈ Γα∣H0}} = Pr{∣T ∣ ≥ t ∣H0}
p-value is a type I error when rejecting any hypothesis based on statisticsequal or more extreme to t in other words it is the minimal type I error overall significance regions that might take place when rejecting a statistic withvalue t
q-value
q − value(t) = infΓα∶t∈Γα
{pFDR{Γα}} = infΓα∶t∈Γα
{Pr{H0∣T ∈ Γα}} == pFDR{∣T ∣ ≥ t} = Pr{H0∣∣T ∣ ≥ t}
q-value is a pFDR error when rejecting any hypothesis based on statisticsequal or more extreme to t in other words it is the minimal pFDR over allsignificance regions that might take place when rejecting a statistic withvalue t
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 8 / 25
q-value maximization in terms of Type I error and power
Note that
argminΓα∶t∈Γα
{pFDR{Γα}} = argminΓα∶t∈Γα
{Pr{H0∣T ∈ Γα}} = argminΓα∶t∈Γα
Pr{T∈Γα∣H0}
Pr{T∈Γα∣H1}=
argminΓα∶t∈Γα
G0(α)G1(α)
= G1(α∗)
G′
1(α∗)
Where
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 9 / 25
Relations between p-value and q-value for concave G1(α)
Figure 1
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 10 / 25
pFDR transformation of p-value to Γα
This theorem says that through pFDR space of p-value can be transformedinto the space of significant regions if and only if the Power function isincreasing slower that Type I error, which is its argument, or in other wordsif and only if the Power function is concave.
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 11 / 25
Generalization of Theorem 1
As one can see theorem one is not valid for both of such settings
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 12 / 25
Asymptotic properties of FDR-controlling measures
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 13 / 25
Asymptotic properties of FDR-controlling measures
Where the following equations define asymptotic frequency based analoguesof Type I error and Power:
Thus, Theorem 4 says that if G0,G1 and π0 can be calculated than forsufficiently large m these provides good approximations for all three FDR-controlling measures.
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 14 / 25
Practical example of such convergence
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 15 / 25
Relation to classification theory
FNR
FNR = E{ TW ∣W ≥ 0}Pr{W ≥ 0}
AND
pFNR
FNR = E{ TW ∣W ≥ 0}
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 16 / 25
Bayes Miss-classification error
BE(Γ)BE(Γ) = (1 − λ)Pr{Ti ∈ Γ,Hi = 0} + λPr{Ti ∉ Γ,Hi = 1}
Classify Hi as 1 Classify Hi as 0
Null true 0 1 − λAlternative true λ 0
Table: 2. Outcomes of classification with the corresponding penalties
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 17 / 25
Bayesian interpretation of pFNR
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 18 / 25
Trade-off between different mixed error measures
Where set Bλ, λ ∈ [0; 1] defines the Bayes rule for the cost matrix given byTable 3:
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 19 / 25
Practical application to DNA micro arrays
Performed steps and achieved results:
1 Ti ∣Hi ∼ (1 −Hi)F0 +HiF1;
2 Pr{Hi = 0∣Ti = ti} = π0f0(ti)π0f0(ti)+π1f1(ti)
is estimated by P̂r{Hi = 0∣Ti = ti};
3 B̂λ = {t ∶ P̂r{H = 0∣T = t}} ≤ λ;
4 λ is chosen to be 0.10;
5 pFDR{B̂0.10} = Pr{H = 0∣T ∈ B̂0.10};
6 q̂ − value(ti) = P̂r{Hi = 0∣Ti ∈ B̂P̂r{Hi=0∣Ti=ti}}
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 20 / 25
Conclusions
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 21 / 25
Discussion of stupid (???) stuff
multiple type I error measure
AY (Γα, θH0) = Pr{V > Y } = 1 − Fbin(Pr(H1∣H0))(Y ),Y = ⌊r1N⌋
multiple type II error measure
BZ(Γα, θH1) = Pr{T > Z} = 1 − Fbin(Pr(H0∣H1))(Z),Z = ⌊r2N⌋
Bayesian rule
{Γ∗α, θ∗
H0, θ∗H1
} = argminΓα,θH0
,θ∗H1
{λ1Bz(Γα, θH1) + λ2Ay(Γα, θH0)}
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 22 / 25
Discussion of stupid (???) stuff
multiple p-value
P(t1, ...tn)Y = infΓα∶{τ1,...,τY }∈Γα
{Pr{{τ1, ..., τY } ∈
Γα,{t1, ..., tn} ∖ {τ1, ..., τY } ∉ Γα∣H0}},{τ1, ..., τY } ⊆ {t1, ..., tn}
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 23 / 25
References
J.D. Storey (2003)
The Positive False Discovery Rate: A Bayesian Interpretation and the q-Value
The Annals of Statistics 31(6), 2013 – 2035.
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 24 / 25
The End.
Thank You for the attention!
Aliaksandr Hubin (UIO) Bayesian FDR 29.08.2014 25 / 25