day6 testing proportions - sjspielman.github.io · hypothesis testing frameworks t-testscompare...
TRANSCRIPT
TestingproportionsBIO5312FALL2017
STEPHANIE J. SPIELMAN,PHD
EstimationAnestimatorisastatistic(~formula)forestimatingaparameterAgoodestimatorisunbiased◦ Theexpectedvalue(expectation)oftheestimatorshouldequaltheparameterbeingestimated
◦ Meanofthesamplingdistributionofthestatisticshouldequaltheparameterbeingestimated
Agoodestimatorisconsistent◦ IncreasingthesamplesizeproducesanestimatewithsmallerSE
Agoodestimatorisefficient◦ HasthesmallestSEamonganyestimatoryoucouldhavechosen
Weareusuallyinterestedinpointestimate,SE,andCINormally-distributedvariable◦ 𝜇" = �̅�
◦ 𝜎() = ∑ (,-.,̅)01-234.5
◦ Knownσ◦ SE= 6
4�
◦ 95%CI=�̅� ± 𝑍:.:)<𝑆𝐸◦ Unknownσ◦ SE= ?
4�
◦ 95%CI=�̅� ± 𝑡:.:)<𝑆𝐸
Hypothesistestingframeworkst-tests comparemeansforcontinuousquantitativedata
Todaywewilllearntoanalyzediscretecountdata("proportions"):◦ Binomialtest◦ 𝝌2 goodness-of-fit◦ Contingencytableanalysis◦ 𝝌2 association/homogeneityandFisherexacttest
Binomialtest𝑃 𝑘𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 = 4
H 𝑝H 1 − 𝑝 (4.H) = 4
H 𝑝H𝑞(4.H)
◦ Binomialcoefficient: 4H = 4!H! 4.H !
Hypothesistest:◦ H0 :Therelativefrequencyofsuccessintheunderlyingpopulationisp0◦ HA :Therelativefrequencyofsuccessintheunderlyingpopulationisnotp0◦ HA :Therelativefrequencyofsuccessintheunderlyingpopulationis>/<p0
Nullproportionofsuccessestotestagainst
Binomialtestassumption:BInSconditionsaresatisfied
Binaryoutcomes
Independenttrials(outcomesdonotinfluenceeachother)
n isfixedbeforethetrialsbegin
Sameprobabilityofsuccess,p,foralltrials
Binomialtest:ExampleInacertainspeciesofwasp,eachwasphasa30%chanceofbeingmale.Icollect12wasps,ofwhich5aremale.Doesmysampleshowevidencethat30%ofwaspsaremale?Useα=0.05.
Inotherwords,istheobservedsuccessproportion5/12(41.67%)consistentwithapopulationwhoseprobabilityofsuccessis0.3?
VerifyingassumptionsBinaryoutcomes:Maleorfemale
Independenttrials:Waspsexdoesnotinfluencesexofotherwasps
n isfixedbeforethetrialsbegin: Icollect12wasps
Sameprobabilityofsuccess,p,foralltrials: P(male)=0.3foreverywasp
PerformingthebinomialtestMysample:◦ p=5/12=0.417◦ n=12◦ X=5 WegenerallysayXinsteadofkwhenperforminghypothesistests,byconvention
H0 :Theprobabilityofbeingamalewaspisp0 =0.3
HA:Theprobabilityofbeingamalewaspdiffersfromp0 =0.3
ThePMFforwaspsex
0.014
0.071
0.17
0.240.23
0.16
0.079
0.029
0.0078 0.0015 0.00019 1.5e−05 5.3e−070.00
0.05
0.10
0.15
0.20
0.25
0 1 2 3 4 5 6 7 8 9 10 11 12Number of males (successes)
Prob
abilit
y m
ass
Thesamplingdistributionforthebinomialteststatisticisbinomial:Thisiseffectivelyournull.
PerformingthetestRecall,theP-valueistheprobabilityofobtainingaresultasextremeormore◦ Therefore,P-valueisP(numberofsuccesses>=5)
𝑃(𝑋 ≥ 5) = 5)< 0.3<0.7(5).<) + 5)
U 0.3U0.7(5).U) + ⋯+ 5)5) 0.3
5)0.7(5).5))
p0 =0.3n=12X=5
0.014
0.071
0.17
0.240.23
0.16
0.079
0.029
0.0078 0.0015 0.00019 1.5e−05 5.3e−070.00
0.05
0.10
0.15
0.20
0.25
0 1 2 3 4 5 6 7 8 9 10 11 12Number of males (successes)
Prob
abilit
y m
ass
> 1 – pbinom(4, 12, 0.3)[1] 0.2673445
Conclusions,round1OurP-valueof0.276ismuchgreaterthanα.Thereforewefailtorejectthenullhypothesisandwehavenoevidencethatthepopulationproportionofmalescorrespondingtooursamplediffersfrom0.3.
NotesonbinomialtestsComputingtwo-sidedP-valuesisnon-trivial◦ Binomialdistributionsymmetriconlywhenp=0.5
> binom.test(5, 12, 0.3)Exact binomial test
data: 5 and 12number of successes = 5, number of trials = 12, p-value = 0.3614alternative hypothesis: true probability of success is not equal to 0.395 percent confidence interval:0.1516522 0.7233303
sample estimates:probability of success
0.4166667
Thisisnot0.276*2!
Computingthebinomialstandarderror𝑺𝑬𝒑" = 𝒑" 𝟏 − 𝒑" /𝒏�
= 𝟎.𝟒𝟏𝟕(𝟏.𝟎.𝟒𝟏𝟕)𝟏𝟐
�= 0.142
Whatisthisvalue?1. Thestandarddeviationofthesamplingdistributionoftheprobabilityofsuccess2. Quantifiestheprecisionof�̂�,ourestimateofthepopulationprob.ofsuccess
ComputingthebinomialconfidenceintervalClassically,weusetheWaldmethod◦ Note:Only"precise"whennisnotverylarge(>0.8)orsmall(<0.2)
◦ 𝒑" istheestimatedproportionofsuccess,X/n=0.417◦ 𝒁𝟎.𝟎𝟐𝟓 is1.96
◦ 𝑺𝑬𝒑" = 𝒑"(𝟏.𝒑")
𝒏�
= 𝟎.𝟒𝟏𝟕(𝟏.𝟎.𝟒𝟏𝟕)𝟏𝟐
�=0.142
𝒑" − 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒑" < 𝒑 < 𝒑" + 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒑"
CalculatingthebinomialCI𝒑" − 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒑" < 𝒑 < 𝒑" + 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒑"
0.417– 0.278<p<0.417+0.278à 0.417± 0.278> binom.test(5, 12, 0.3)Exact binomial test
data: 5 and 12number of successes = 5, number of trials = 12, p-value = 0.3614alternative hypothesis: true probability of success is not equal to 0.395 percent confidence interval:0.1516522 0.7233303
sample estimates:probability of success
0.4166667
Rusesamoreexactmethod,theClopper-Pearsoninterval
FinalconclusionsOurP-valueof0.276ismuchgreaterthanα.Thereforewefailtorejectthenullhypothesisandwehavenoevidencethatthepopulationproportionofmalescorrespondingtooursamplediffersfrom0.3.
Ourestimatedproportionofsuccessis0.417withSE=0.142anda95%CIof0.417± 0.278.
Pause:Binomialexercise
Use𝟀2Goodness-of-fittestifwedonothavebinaryoutcomesGoodness-of-fittestasksifobservedproportionsareequaltoanullproportion
df =(numberofcategories)– 1– (numberofparametersestimatedfromdata)0forgoodness-of-fittest
Example:Arebabiesbornwiththesamefrequencyeverydayoftheweek?
Freq
uenc
y
Sun.Mon.
Tue.W
ed.Thu.
Fri. Sat.
70
60
50
40
30
20
10
0
Dayin1999 #birthsSunday 33Monday 41Tuesday 63Wednesday 63Thursday 47Friday 56Saturday 47
H0 :Theprobabilityofbirthwasthesameeverydayoftheweekin1999.
HA:Theprobabilityofbirthwasnotthesameeverydayoftheweekin1999.
Teststatistic𝜒) = ∑ #ij?klmkn-.#k,okpqkn- 0
#k,okpqkn-�r
Day# Observed
births #daysin1999 Expectedprop # ExpectedbirthsSunday 33 52 52/365 =0.142 0.142*52=49.863Monday 41 52 0.142 49.863Tuesday 63 52 0.142 49.863Wednesday 63 52 0.142 49.863Thursday 47 52 0.142 49.863Friday 56 53 0.145 50.822Saturday 47 52 0.142 49.863
Total 350 365 1 1
Calculatingtheteststatisticanddf
Day # Observedbirths # ExpectedbirthsSunday 33 0.142*52=49.863Monday 41 49.863Tuesday 63 49.863Wednesday 63 49.863Thursday 47 49.863Friday 56 50.822Saturday 47 49.863
Total 350 1
𝜒) = s#𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑r − #𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑r )
#𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑r
�
r
= (yy.z{.|Uy)0
z{.|Uy+(z5.z{.|Uy)
0
z{.|Uy+(Uy.z{.|Uy)
0
z{.|Uy+(Uy.z{.|Uy)
0
z{.|Uy+(z}.z{.|Uy)
0
z{.|Uy+(<U.<:.|)))
0
<:.|))+(z}.z{.|Uy)
0
z{.|Uy
=15.05
df =#categories – 1=7– 1=6
OurcategoricalvariableisDaysofweekIthassevencategories
Reportsandconclusions
At0.0199,we reject the nullhypothesis that are births are equally distributedacross days in1999.We haveevidence that frequency of births differs acrossdays.
Prob
abili
ty d
ensi
ty
0.160.140.120.100.080.060.040.02
00
26
!5 10 15
!2 = 15.05
20
> 1 - pchisq(15.05, 6) [1] 0.01987137
Noteson𝟀2Goodness-of-fittestAssumptionsforall𝟀2 tests◦ Randomlysampleddatafrompopulation◦ Twoormorecategoriesofacategoricalvariable(dataiscounts)◦ Expected frequenciesmustbe>=1◦ Nomorethan20%ofexpectedfrequenciesare<5
Wetakeonly>=teststatisticforP-value◦ Generaltoall𝟀2tests
𝟀2goodness-of-fitinR#### Prepare data: Observed counts and expected proportions ####> births <- c(33,41,63,63,47,56,47)> expected <- c(52,52,52,52,52,53,52)> expected <- expected/sum(expected)> expected[1] 0.1424658 0.1424658 0.1424658 0.1424658 0.1424658 0.1452055 0.1424658
> chisq.test(births, p = expected)
Chi-squared test for given probabilities
data: birthsX-squared = 15.057, df = 6, p-value = 0.01982
BinomialispreferredfortwogroupsTempleUniversitystudentsare52%female,48%male.DoesthisclassreflecttheTemplestudentpopulation?
Wehave19students:7femalesand12males.
BinomialP-valuesaremoreprecise> binom.test(7, 19, 0.52)
Exact binomial test
data: 7 and 19number of successes = 7, number of trials = 19, p-value = 0.251alternative hypothesis: true probability of success is not equal to 0.5295 percent confidence interval:0.1628859 0.6164221
sample estimates:probability of success
0.3684211
> chisq.test(c(7,12), p = c(0.52, 0.48))
Chi-squared test for given probabilities
data: c(7, 12)X-squared = 1.749, df = 1, p-value = 0.186
Pause:Goodnessoffitexercise
ContingencytableanalysisTestforanassociationbetweentwo(ormore)categoricalvariables◦ Areheartattacksmorelikelyforpeoplewhotakeaspirindaily?◦ Aresmokersmorelikelytodrinkthannon-smokers?
Twoflavors:◦ 𝟀2testforindependence(orhomogeneity)◦ Fisher'sExacttest
Contingencytablesshowassociatedcountsfortwo+categoricalvariables
Takes dailyaspirin Nodailyaspirin
Heartattack 75 62
Noheartattack 108 71
Example:𝟀2testforindependence/associationLifecycleofR.ondatrae
Uninfectedfrog Infectedfrog
Eatenbybird 1 47
Noteatenbybird 49 44
2variables:Eaten (2categoriesyes/no)Infected (2categoriesyes/no)
Example:𝟀2testforindependence
Uninfected frog Infectedfrog TOTAL
Eatenbybird 1 47 48
Noteaten 49 44 93
TOTAL 50 91 141
H0 :Infectionandbeingeatenareindependent
HA:Infectionandbeingeatenarenotindependent
Computingtheteststatistic
𝜒) = ∑ ∑ #ij?klmkn~,�.#k,okpqkn~,�0
#k,okpqkn~,��l
�p
Uninfected Infected TOTAL
Eaten 1 47 48
Noteaten 49 44 93
TOTAL 50 91 141
Underthenullhypothesis,thevariablesareindependent.ExpectedcalculationsemployP[AandB]=P[A]xP[B]
P[eatenanduninfected]=P[eaten]xP[uninfected]=48/141x50/141=0.1207
Expectedcount=P[eatenanduninfected] xtotal=17.02… =(row/total)x(column/total)x(total)
Performingthetest
𝜒) = s s#𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑l,p − #𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑l,p
)
#𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑l,p
�
l
�
p
= (5.5}.:))0
5}.:)+(zz.y:.{)
0
y:.{+(z{.yy.y)
0
yy.y+(z}.U:.))
0
U:.)=31.9
df =(#r– 1)(#c– 1)=(2– 1)(2– 1)=1
Uninfected Infected
Eaten 117.02 4430.9
Noteaten 4933.3 4760.2
1 – pchisq(31.9, 1)[1] 1.623172e-08
Werejectthenullhypothesis(P<<α)thatinfectionandbeingeatenareindependent.Wehaveevidencethatbeinginfectedwiththistrematodeisassociatedwithbeingeatenbyabird.
PerformingthetestinR> data.table <- rbind(c(1,49), c(44,47)) > data.table
[,1] [,2][1,] 1 49[2,] 44 47
> chisq.test(data.table)Pearson's Chi-squared test with Yates' continuity correctiondata: data.tableX-squared = 29.809, df = 1, p-value = 4.768e-08
> chisq.test(data.table, correct=FALSE)Pearson's Chi-squared testdata: data.tableX-squared = 31.906, df = 1, p-value = 1.618e-08
Thisiswhatwecalculatedonthelastslide("Rascalculator").Differencesarefromusingroundedexpectedcounts.
Yatescontinuitycorrection
𝜒) = ∑ ∑ #ij?klmkn~,�.#k,okpqkn~,�0
#k,okpqkn~,��l
�p
𝜒) = ∑ ∑ #ij?klmkn~,�.#k,okpqkn~,� .:.<0
#k,okpqkn~,��l
�p
Withoutcorrection
Yatescontinuitycorrection
DecreasestheteststatisticandincreasestheP-value
OddsTheodds ofsuccessaretheprobabilityofsuccessdividedbyfailure
𝑂 = o5.o
Theoddsofbeingeatenwhileinfected
𝑂 = �[k�qk4�4nr4�kpqkn]5.�[k�qk4�4nr4�kpqkn]
= z}/{55.z}/{5
= 1.07
𝑂 = �[k�qk4�4nr4�kpqkn]�[4iqk�qk4�4nr4�kpqkn]
= z}zz= 1.07
Uninfected Infected TOTAL
Eaten 1 47 48
Noteaten 49 44 93
TOTAL 50 91 141
Oddsratio,for2x2tablesTheodds ratioistheoddsofsuccessinonegroupdividedbyoddsofsuccessinasecondgroup
𝑂𝑅 = o3 (5.o3)⁄o0 (5.o0)⁄
Interpretation◦ OR=1:Oddsofsuccessisthesameforeithergroup◦ OR<1:Oddsofsuccessingroup2arehigherthangroup1◦ OR>1:Oddsofsuccessingroup1arehigherthangroup2
ORsquantifythedeviationfromnullin2x2contingencytabletests.
Oddsratiocalculations:Aretheoddshigherthatyouareeatenwhileinfected?
𝑂5 = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]
1 − 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]
𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =4744 = 1.07
𝑂) = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]
1 − 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]
𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =149 = 0.02
𝑶𝑹 = 𝟏. 𝟎𝟕𝟎. 𝟎𝟐 = 𝟓𝟐. 𝟑 Infectedfrogshave52.3 theoddsofbeingeaten
comparedtouninfectedfrogs.
Uninfected Infected TOTAL
Eaten 1 47 48
Noteaten 49 44 93
TOTAL 50 91 141
Oddsratiocalculations:Aretheoddshigherthatyouareeatenwhileinfected?
𝑂5 = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]
1 − 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]
𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =4744 = 1.07
𝑂) = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]
1 − 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]
𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =149 = 0.02
Infected frogshave52.3 theoddsofbeingeatencomparedtouninfected frogs.
Uninfected Infected TOTAL
Eaten 1 47 48
Noteaten 49 44 93
TOTAL 50 91 141
𝑶𝑹 = 𝟏. 𝟎𝟕𝟎. 𝟎𝟐 = 𝟓𝟐. 𝟑
Oddsratiocalculations:Aretheoddshigherthatyouareeatenwhileinfected?
𝑂5 = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]
𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] = 1.07
𝑂) = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]
𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] = 0.02
Uninfected Infected TOTAL
Eaten 1 47 48
Noteaten 49 44 93
TOTAL 50 91 141
Uninfected Infected TOTAL
Eaten 1 47 48
Noteaten 49 44 93
TOTAL 50 91 141
𝑂5 = 𝑃[𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑𝑎𝑛𝑑𝑒𝑎𝑡𝑒𝑛]𝑃[𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑𝑎𝑛𝑑𝑒𝑎𝑡𝑒𝑛] =
471 = 47
𝑂) = 𝑃[𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑𝑎𝑛𝑑𝑢𝑛𝑒𝑎𝑡𝑒𝑛]𝑃[𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑𝑎𝑛𝑑𝑢𝑛𝑒𝑎𝑡𝑒𝑛] =
4449 = 0.899
𝑶𝑹 = 𝟒𝟕
𝟎. 𝟖𝟗𝟗 = 𝟓𝟐. 𝟑Infectedfrogshave52.3theoddsofbeingeatencomparedtouneaten frogs.
Eatenfrogshave52.3theoddsofbeinginfectedcomparedtouneatenfrogs.
TherearetwowaystocalculateOROnewillbe>1(52.3)andonewillbe<1(1/52.3 =0.019)◦ Wegenerallyusethe>1option◦ Convinceyourselfthatthisistrue.
Funfact:𝑶𝑹 = 𝒂∗𝒅𝒃∗𝒄
= 𝟏∗𝟒𝟒𝟒𝟗∗𝟒𝟕
= 𝟎. 𝟎𝟏𝟗
Oftenwereportlogodds =ln(OR) > log(52.3) [1] 3.956996
Uninfected Infected TOTAL
Eaten a 1 c 47 48
Noteaten b 49 d 44 93
TOTAL 50 91 141
CalculatingtheORstandarderror
𝑆𝐸 ln 𝑂𝑅 = 5�+ 5
j+ 5
p+ 5
n�
𝑆𝐸 ln 𝑂𝑅 = 55+ 5
z{+ 5
z}+ 5
zz� = 1.03
blah blah2
blob a c
blob1 b d
Uninfected Infected
Eaten 1 47
Noteaten 49 44
CalculatingthelogoddsCI
𝒍𝒏(𝑶𝑹� ) − 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝑶𝑹� < 𝒍𝒏(𝑶𝑹) < 𝒍𝒏(𝑶𝑹)� + 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝑶𝑹�
3.96– (1.96*1.03)<𝒍𝒏(𝑶𝑹)<3.96+(1.96*1.03)à 3.96± 2.02
Conclusions,withlogoddsWerejectthenullhypothesis(P<<α)thatinfectionandbeingeatenareindependent.Wehaveevidencethatbeinginfectedwiththistrematodeisassociatedwithbeingeatenbyabird.
Furthermore,frogsthatareeatenaremorelikelytobeinfectedcomparedtouneatenfrogs,withalogoddsratioof3.96andlogoddsCI of1.94– 5.98.
𝟀2testforhomogeneityIndependence:measuretwoproperties fromonesetofsubjects◦ Wemeasuredeaten andinfection forfrogs
Homogeneity:measureoneproperty ontwosetsofsubjectsfromdifferentpopulations◦ Measureeffectofmedicine insampleofcancerindividualsandsampleofhealthyindividuals
Example:testofhomogeneityDrug Placebo
Cancer 75 62
Healthy 108 71
H0 :Theprobabilitythatsymptomsimproveisthesameforbothcancerandhealthygroups.HA:Theprobabilitythatsymptomsimprovediffersbetweencancerandhealthygroups.
Inpracticalterms,thisusestheexactsameprocedureasatestforindependence.
Fisher'sExacttestMoreexactthan𝟀2andusedforlow-counttablesComputetheexactprobabilityofobservingtablewithcounts:
Fisher'stestcomputesthisvalueforallpossibletables withthesamerow/columntotals(margins)ComputesP-valuebysummingprobabilitiesfortableswithasextremeormorecountdistributions
blah blah2
blob a c
blob1 b d
𝑃 𝑎, 𝑏, 𝑐, 𝑑 =𝑎 + 𝑏 ! 𝑐 + 𝑑 ! 𝑎 + 𝑐 ! 𝑏 + 𝑑 !
𝑛! 𝑎! 𝑏! 𝑐! 𝑑!
Fisher'sexacttest
Uninfected Infected TOTAL
Eaten 1 47 48
Noteaten 49 44 93
TOTAL 50 91 141
> chisq.test(data.table, correct=FALSE)Pearson's Chi-squared test
data: data.tableX-squared = 31.906, df = 1, p-value = 1.618e-08
> fisher.test(data.table)Fisher's Exact Test for Count Data
data: data.tablep-value = 8.37e-10alternative hypothesis: true odds ratio is not equal to 195 percent confidence interval:0.0005344122 0.1417331275
sample estimates:odds ratio0.02222648
ExactP-value
OurOR=52.3,or0.019.Slightdifferencesareexpectedbecausefisher.test()usesML
ApproximateP-value
Relativerisk:It'snottheORCommonlymeasuredinepidemiologicalstudies
Relativeriskistheprobabilityofanevent(ie disease)inanexposedgroup,relativetounexposedgroup◦ RR=P(eventwhenexposed)/P(eventwhennotexposed)
RelativeriskexampleLungcancer
Nolungcancer
Smoker 525 450
Non-smoker
32 621
RR=P(eventwhenexposed)/P(eventwhennotexposed)
RRofcancerduetosmokingexposure:=P(cancer|smoker)/P(cancer|notsmoker)=[525/(525+450)]/[32/(32+621)]
=10.99
à Smokershavea10.99timeshigherriskthandonon-smokerstodeveloplungcancer.
Liveexercise:Calculatetheoddsratioforasmokerdevelopingcancerrelativetoanon-smoker.
TheOddsRatioLungcancer
Nolungcancer
Smoker 525 450
Non-smoker
32 621
𝑂5 = 𝑃[𝑠𝑚𝑜𝑘𝑒𝑟𝑎𝑛𝑑𝑐𝑎𝑛𝑐𝑒𝑟]
𝑃[𝑛𝑜𝑛 − 𝑠𝑚𝑜𝑘𝑒𝑟𝑎𝑛𝑑𝑐𝑎𝑛𝑐𝑒𝑟] =52532
𝑂) = 𝑃[𝑠𝑚𝑜𝑘𝑒𝑟𝑎𝑛𝑑𝑛𝑜𝑐𝑎𝑛𝑐𝑒𝑟]𝑃[𝑛𝑜𝑛 − 𝑠𝑚𝑜𝑘𝑒𝑟𝑛𝑜𝑐𝑎𝑛𝑐𝑒𝑟] =
450621
𝑂𝑅 = 525/32450/621 = 𝟐𝟐. 𝟔𝟒
à Smokershave22.64timestheoddsofgettinglungcancerthannon-smokers.
What'sthepracticaldifference?Oddsratiosmeasuretheextentofassociationbetweenvariables.◦ Itistheratiooftwoodds(ratioofprob event:prob non-event)
Relativeriskisthemoreintuitivequantitythatwe"understand"◦ Itistheratiooftwoprobabilities(prob event)
RecaponestimationNormally-distributed variable◦ 𝜇" = �̅�◦ 𝜎() = 𝑠)
◦ Knownσ
◦ 𝑆𝐸,̅ =64�
◦ 95%CI=�̅� ± 𝑍:.:)<𝑆𝐸◦ Unknownσ
◦ 𝑆𝐸,̅ =?4�
◦ 95%CI=�̅� ± 𝑡:.:)<𝑆𝐸
Binomially-distributedvariable
◦ �̂� = H4
◦ 𝑆𝐸o( = �̂�(1 − �̂�)/𝑛�
◦ 95%CI=�̂� ± 𝑍:.:)<𝑆𝐸o(
Log-Oddsratio
◦ 𝑙𝑜𝑔𝑂𝑅� = ln o(3 5.o(3⁄o(0 5.o(0⁄
◦ 𝑆𝐸¤i¥¦§� = 5�+ 5
j+ 5
p+ 5
n�
◦ 95%CI=𝑙𝑜𝑔𝑂𝑅� ± 𝑍:.:)<𝑆𝐸o(
Chooseyourownadventure,sofar
(Orfisher'sexacttest)