p-values and statistical tests - university of dundee · false positive probability 7 h 0: no...
TRANSCRIPT
P-values and statistical tests7. Multiple test corrections
Hand-outsavailableathttp://is.gd/statlec
MarekGierlińskiDivisionofComputationalBiology
2
Lets perform a test 𝑚 times
3
H0 true H0 false Total
Significant 𝐹𝑃 𝑇𝑃 𝐷
Notsignificant 𝑇𝑁 𝐹𝑁 𝑚 − 𝐷
Total 𝑚( 𝑚) 𝑚
Falsepositives Truepositives
Truenegatives Falsenegatives
Numberoftests
Numberofdiscoveries
4
Family-wise error rate
𝐹𝑊𝐸𝑅 = Pr(𝐹𝑃 ≥ 1)
Probabilities of independent events multiply
5
Tosstwocoins
12
12
12
12
12
12
𝑃 𝐻and𝐻 = 𝑃(𝐻)×𝑃(𝐻)14
Probabilities of either event is 1 − 1 − 𝑝 =
6
Tosstwocoins
12
12
12
12
12
12
𝑃 𝐻or𝐻 =?14
14
14
𝑃 𝑇 = 1 − 𝑃(𝐻)
𝑃 𝑇and𝑇 = 𝑃 𝑇 ×𝑃 𝑇= 1 − 𝑃 𝐻 =
𝑃 𝐻or𝐻 = 1 − 𝑃 𝑇and𝑇
𝑃 𝐻or𝐻 = 1 − 1 − 𝑃 𝐻 =
𝑃(𝐻or𝐻) = 1 − 1 −12
==34
14
False positive probability
7
H0:noeffectSet𝛼 = 0.05
Onetest
Probabilityofhavingafalsepositive𝑃) = 𝛼
Twoindependenttests
Probabilityofhavingatleastonefalsepositiveineithertest𝑃= = 1 − 1 − 𝛼 =
𝒎 independent tests
Probabilityofhavingatleastonefalsepositiveinanytest𝑃F = 1 − 1 − 𝛼 F
Family-wise error rate (FWER)
8
Probabilityofhavingatleastonefalsepositiveamong𝑚 tests;𝛼 = 0.05
FWER
𝑃F = 1 − 1 − 𝛼 F
Jellybeanstest,𝑚 = 20, 𝛼 = 0.05
𝑃=( = 1 − 1 − 0.05 =( = 0.64
𝛼
Bonferroni limit – to control FWER
9
ControllingFWER
Wewanttomakesurethat
𝐹𝑊𝐸𝑅 ≤ 𝛼J.
Then,theFWERiscontrolledatlevel𝛼′.
Bonferronilimit
𝛼J =𝛼𝑚
𝑃F = 1 − 1 −𝛼𝑚
F≈ 𝛼
FWER
Probabilityofhavingatleastonefalsepositiveamong𝑚 tests;𝛼 = 0.05
𝑃F = 1 − 1 − 𝛼 F
Test data (1000 independent experiments)
10
Negatives
𝜇( = 20 g𝑚( = 970
Positives
𝜇) = 40 g𝑚) = 30
Randomsamples,size𝑛 = 5,fromtwonormaldistributions
One sample t-test, H0: 𝜇 = 20 g
11
970negatives
30positives
Nocorrection
H0 true H0 false Total
Significant 56 30 86Not
significant 914 0 914
Total 970 30 1000
One sample t-test, H0: 𝜇 = 20 g
12
Nocorrection
H0 true H0 false Total
Significant 𝐹𝑃 = 56 𝑇𝑃 = 30 86
Notsignificant 𝑇𝑁 = 914 𝐹𝑁 = 0 914Total 970 30 1000
𝐹𝑃𝑅 =56
56 + 914 = 0.058
𝐹𝑁𝑅 =0
0 + 30 = 0
Family-wiseerrorrate 𝐹𝑊𝐸𝑅 = Pr(𝐹𝑃 ≥ 1)
Falsepositiverate 𝐹𝑃𝑅 =𝐹𝑃𝑚(
=𝐹𝑃
𝐹𝑃 + 𝑇𝑁
Falsenegativerate 𝐹𝑁𝑅 =𝐹𝑁𝑚)
=𝐹𝑁
𝐹𝑁 + 𝑇𝑃
Bonferroni limit
13
Nocorrection Bonferroni
𝛼 0.05 5×10ST
𝐹𝑃𝑅 0.058 0𝐹𝑁𝑅 0 0.87
Nocorrection
H0 true H0 false Total
Significant 56 30 86Not
significant 914 0 914
Total 970 30 1000
Bonferroni
H0 true H0 false Total
Significant 0 4 4Not
significant 970 26 996
Total 970 30 1000
Holm-Bonferroni method
14
Sortp-values
𝑝()), 𝑝(=), … , 𝑝(F)
Reject(1)if𝑝()) ≤VF
Reject(2)if𝑝(=) ≤V
FS)
Reject(3)if𝑝(W) ≤V
FS=
...
Stopwhen𝑝(X) >V
FSXZ)𝑘 𝑝 𝛼
𝛼𝑚
𝛼𝑚 − 𝑘 + 1
1 0.003 0.05 0.01 0.01
2 0.005 0.05 0.01 0.0125
3 0.012 0.05 0.01 0.017
4 0.04 0.05 0.01 0.025
5 0.058 0.05 0.01 0.05
Holm-BonferronimethodcontrolsFWER
Holm-Bonferroni method
15
Nocorrection Bonferroni HB𝛼 0.05 5×10ST 5×10ST
𝐹𝑃𝑅 0.058 0 0𝐹𝑁𝑅 0 0.87 0.87
Holm-Bonferroni
H0 true H0 false Total
Significant 0 4 4Not
significant 970 26 996
Total 970 30 1000
Bonferroni
Holm-Bonferroni
30smallestp-values(outof1000)
False discovery rate
𝐹𝑃𝑅 =𝐹𝑃𝐷
False discovery rate
17
Falsepositiverate
𝐹𝑃𝑅 =𝐹𝑃𝑚(
=𝐹𝑃
𝐹𝑃 + 𝑇𝑁
Thefractionoftrulynon-significanteventswefalselymarkedassignificant
𝐹𝑃𝑅 =56970 = 0.058
Falsediscoveryrate
𝐹𝑃𝑅 =𝐹𝑃𝐷 =
𝐹𝑃𝐹𝑃 + 𝑇𝑃
Thefractionofdiscoveriesthatarefalse
𝐹𝐷𝑅 =5686 = 0.65
Nocorrection
H0 true H0 false Total
Significant 𝐹𝑃 = 56 𝑇𝑃 = 30 86
Notsignificant 𝑇𝑁 = 914 𝐹𝑁 = 0 914Total 970 30 1000
Benjamini-Hochberg method
18
Sortp-values
𝑝()), 𝑝(=), … , 𝑝(F)
Findthelargest𝑘,suchthat
𝑝(X) ≤𝑘𝑚𝛼
Rejectallnullhypothesesfor𝑖 = 1, … , 𝑘
𝑘 𝑝 𝛼𝛼𝑚
𝛼𝑚 − 𝑘 + 1
𝑘𝑚𝛼
1 0.003 0.05 0.01 0.01 0.01
2 0.005 0.05 0.01 0.0125 0.02
3 0.012 0.05 0.01 0.017 0.03
4 0.038 0.05 0.01 0.025 0.04
5 0.058 0.05 0.01 0.05 0.05
Benjamini-HochbergmethodcontrolsFDR
Benjamini-Hochberg method
19
Nocorrection Bonferroni HB BH𝛼 0.05 5×10ST 3.7×10ST 0.0011𝐹𝑃𝑅 0.058 0 0 0.0021𝐹𝑁𝑅 0 0.87 0.87 0.30𝐹𝐷𝑅 0.65 0 0 0.087
Benjamini-Hochberg
H0 true H0 false Total
Significant 2 21 23Not
significant 968 9 977
Total 970 30 1000
Bonferroni
Holm-Bonferroni
Benjamini-Hochberg
Controlling FWER and FDR
20
Benjamini-HochbergcontrolsFDR
𝐹𝐷𝑅 =𝐹𝑃
𝐹𝑃 + 𝑇𝑃
𝐹𝐷𝑅 isarandomvariable
ControllingFDR- guaranteed
𝐸[𝐹𝐷𝑅] ≤ 𝛼
Holm-BonferronicontrolsFWER
𝐹𝑊𝐸𝑅 = Pr(𝐹𝑃 ≥ 1)
ControllingFWER- guaranteed
𝐹𝑊𝐸𝑅 ≤ 𝛼′
Benjamini-Hochberg procedure controls FDR
21
ControllingFDR
𝐸[𝐹𝐷𝑅] ≤ 𝛼
𝐸 𝐹𝐷𝑅 canbeapproximatedbythemeanovermanyexperiments
Bootstrap:generatetestdata10,000times,perform1000t-testsforeachsetandfindFDRforBHprocedure
Adjusted p-values
22
p-valuescanbe“adjusted”,sotheycomparedirectlywith𝛼,andnotX
F𝛼
Problem:adjustedp-valuedoesnotexpressanyprobability
Despitetheirpopularity,Irecommendagainstusingadjustedp-values
rawp-values
adjustedp-values
𝑘𝑚𝛼
𝛼
How to do this in R# Read generated data
> d = read.table("http://tiny.cc/two_hypotheses", header=TRUE)> p = d$p
# Holm-Bonferroni procedure> p.adj = p.adjust(p, "holm")> p[which(p.adj < 0.05)][1] 1.476263e-05 2.662440e-05 3.029839e-05
# Benjamini-Hochberg procedure> p.adj = p.adjust(p, "BH")> p[which(p.adj < 0.05)][1] 1.038835e-03 6.670798e-04 1.050547e-03 1.476263e-05 5.271367e-04[6] 3.503370e-04 9.664789e-04 1.068863e-03 7.995860e-04 5.404476e-04[11] 9.681321e-04 1.580069e-04 1.732747e-04 3.159954e-04 2.662440e-05[16] 4.709732e-04 1.517964e-04 2.873971e-04 3.258726e-04 4.087615e-04[21] 3.029839e-05 9.320438e-04 1.713309e-04 2.863402e-04 4.082322e-04
23
Estimating false discovery rate
Control and estimate
25
ControllingFDR
1. FixacceptableFDRlimit,𝛼,beforehand
2. Findathresholdingrule,sothat
𝐸[𝐹𝐷𝑅] ≤ 𝛼
EstimatingFDR
Foreachp-value,𝑝_,formapointestimateofFDR,
𝐹𝐷�̀�(𝑝_)
P-value distribution
26
100%null Dataset280%null,20%alternative
P-value distribution
27
Good Bad!
Definition of 𝜋(
28
80%null,20%alternative Proportionofnulltests
𝜋( =# non−significanttests
# alltests
𝜋(
H0
H1
Storey method
29
Storey,J.D.,2002,JRStatistSoc B,64,479
Estimate𝝅𝟎
Usethehistogramfor𝑝 > 𝜆 toestimate𝜋n((𝜆)
Doitforall0 < 𝜆 < 1 andthenfindthebest𝜋n(
𝜆
Point estimate of FDR
30
80%null,20%alternative Pointestimate,𝑭𝑫𝑹(𝒕)
Arbitrarylimit𝑡,every𝑝_ < 𝑡 issignificant.No.ofsignificanttestsis
𝑅(𝑡) = # 𝑝_ < 𝑡
No.offalsepositivesis
𝐹𝑃(𝑡) = 𝑡𝜋(𝑚
Hence,
𝐹𝐷𝑅 𝑡 =𝑡𝜋(𝑚𝑅(𝑡)
𝜋(
𝑡
Truepositives
Falsepositives
Storey method
31
PointestimateofFDRThisistheso-calledq-value:
𝑞 𝑝_ = minwxyz
𝐹𝐷𝑅 𝑡
Ifmonotonic
𝑞 𝑝_ = 𝐹𝐷𝑅 𝑝_
How to do this in R> library(qvalue)
# Read data set 1
> pvalues = read.table('http://tiny.cc/multi_FDR', header=TRUE)> p = pvalues$p
# Benjamini-Hochberg limit> p.adj = p.adjust(p, method='BH')> sum(p.adj <= 0.05) [1] 216
# q-values
> qobj = qvalue(p)> q = qobj$qv> summary(qobj)
pi0: 0.8189884
Cumulative number of significant calls:<1e-04 <0.001 <0.01 <0.025 <0.05 <0.1 <1
p-value 40 202 611 955 1373 2138 10000q-value 0 1 1 96 276 483 10000local FDR 0 1 3 50 141 278 5915
> plot(qobj)> hist(qobj)
32
33
Interpretation of q-value
34
No. ID p-value q-value
... ... ... ...
100 9249 0.000328 0.0266
101 8157 0.000328 0.0266
102 8228 0.000335 0.0269
103 8291 0.000338 0.0269
104 8254 0.000347 0.0272
105 8875 0.000348 0.0272
106 8055 0.000353 0.0273
107 8235 0.000375 0.0284
108 8148 0.000376 0.0284
109 8236 0.000381 0.0284
110 8040 0.000382 0.0284
... ... ... ...
Thereare106testswith𝑞 ≤ 0.0273
Expect2.73%offalsepositivesamongthesetests
Expect~3 falsepositivesifyousetalimitof𝑞 ≤ 0.0273 or𝑝 ≤ 0.00353
q-valuetellsyouhowmanyfalsepositivesyoushouldexpectafter
choosingasignificancelimit
Q-values vs Benjamini-Hochberg
35
q-value
BHadjustedp-value
𝜋n(
When𝜋n( = 1,bothmethodsgivethesameresult.
ForthesameFDR,Storey’smethodprovidesmoresignificantp-values.
Hence,itismorepowerful,especiallyforsmall𝜋n(.
Butthisdependsonhowgoodtheestimateof𝜋n( is.
𝜋n( - estimateoftheproportionofnull(non-significant)tests
Which multiple-test correction should I use?
36
Falsepositive
“Discover”effectwherethereisnoeffect
Canbetestedinfollow-upexperiments
Nothugelyimportantinsmallsamples
Impossibletomanageinlargesamples
Falsenegative
Misseddiscovery
Onceyou’vemissedit,it’sgone
Falsepositives Falsenegatives
Nocorrection BonferroniBenjamini-Hochberg
Storey
Multiple test procedures: summary
37
Method Controls Advantages Disadvantages Recommendation
Nocorrection
FPR Falsenegativesnotinflated
Canresultin𝐹𝑃 ≫ 𝑇𝑃
Smallsamples,whenthecostofFNishigh
Bonferroni FWER None Lotsoffalsenegatives
Donotuse
Holm-Bonferroni
FWER SlightlybetterthanBonferroni
Lotsoffalsenegatives
Appropriateonlywhen youwanttoguardagainstanyfalsepositives
Benjamini-Hochberg
FDR Goodtrade-offbetweenfalsepositivesandnegatives
Onaverage,𝛼 ofyourpositiveswillbefalse
Betterinlargesamples
Storey -- MorepowerfulthanBH,inparticularforsmall𝜋n(
Dependsonagoodestimateof𝜋n(
Thebestmethod,givesmoreinsightintoFDR
Hand-outsavailableathttp://tiny.cc/statlec
Benjamini-Hochberg method
39
Theoreticalnulldistribution:uniformp-values
𝑝(X) =𝑘𝑚
Generateddatanulldistribution
Benjamini-Hochberg method
40
𝑘𝑚
𝑘𝑚𝛼
100%FDR
5%FDR
Nulldata1000H0
Testdata970H020H1
p-values
Benjamini-Hochberg method
41
𝑘𝑚
𝑘𝑚𝛼
Nulldata1000H0
Testdata970H020H1
p-value
p-value