contingency tables 1.explain 2 test of independence 2.measure of association
TRANSCRIPT
Contingency TablesContingency Tables
11.. Explain Explain 22 Test of Independence Test of Independence
22.. Measure of AssociationMeasure of Association
Contingency TablesContingency Tables
• Tables representing all combinations Tables representing all combinations of levels of explanatory and response of levels of explanatory and response variablesvariables
• Numbers in table represent Numbers in table represent CountsCounts of the number of cases in each cellof the number of cases in each cell
• Row and column totals are called Row and column totals are called Marginal countsMarginal counts
2x2 Tables2x2 Tables
• Each variable has 2 levelsEach variable has 2 levels– Explanatory Variable – Groups (Typically Explanatory Variable – Groups (Typically
based on demographics, exposure) based on demographics, exposure) – Response Variable – Outcome (Typically Response Variable – Outcome (Typically
presence or absence of a characteristic)presence or absence of a characteristic)
2x2 Tables - Notation2x2 Tables - Notation
OutcomePresent
OutcomeAbsent
GroupTotal
Group 1 n11 n12 n1.
Group 2 n21 n22 n2.
OutcomeTotal
n.1 n.2 n..
22 Test of Independence Test of Independence
22 Test of Independence Test of Independence
• 1.1. Shows If a Relationship Exists Shows If a Relationship Exists Between 2 Qualitative VariablesBetween 2 Qualitative Variables– One Sample Is DrawnOne Sample Is Drawn– Does Does NotNot Show Causality Show Causality
• 2.2. AssumptionsAssumptions– Multinomial ExperimentMultinomial Experiment– All Expected Counts All Expected Counts 5 5
• 3.3. Uses Two-Way Contingency TableUses Two-Way Contingency Table
22 Test of Independence Test of Independence Contingency Table Contingency Table
• 1.1. Shows # Observations From 1 Shows # Observations From 1 Sample Jointly in 2 Qualitative Sample Jointly in 2 Qualitative VariablesVariables
House Location House Style Urban Rural Total Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160
House Location House Style Urban Rural Total Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160
22 Test of Independence Test of Independence Contingency Table Contingency Table
• 1.1. Shows # Observations From 1 Shows # Observations From 1 Sample Jointly in 2 Qualitative Sample Jointly in 2 Qualitative VariablesVariables Levels of variable 2Levels of variable 2
Levels of variable 1Levels of variable 1
22 Test of Independence Test of Independence Hypotheses & StatisticHypotheses & Statistic
• 1.1. HypothesesHypotheses– HH00: Variables Are Independent : Variables Are Independent
– HHaa: Variables Are Related (Dependent): Variables Are Related (Dependent)
22 Test of Independence Test of Independence Hypotheses & StatisticHypotheses & Statistic
• 1.1. HypothesesHypotheses– HH00: Variables Are Independent : Variables Are Independent
– HHaa: Variables Are Related (Dependent): Variables Are Related (Dependent)
• 2.2. Test StatisticTest StatisticObserved countObserved count
Expected Expected countcount 2
2
n E n
E n
ij ij
ij
c h
c hall cells
2
2
n E n
E n
ij ij
ij
c h
c hall cells
22 Test of Independence Test of Independence Hypotheses & StatisticHypotheses & Statistic
• 1.1. HypothesesHypotheses– HH00: Variables Are Independent : Variables Are Independent
– HHaa: Variables Are Related (Dependent): Variables Are Related (Dependent)
• 2.2. Test StatisticTest Statistic
• Degrees of Freedom: (Degrees of Freedom: (rr - 1)( - 1)(cc - 1) - 1)RowsRows Columns Columns
Observed countObserved count
Expected Expected countcount 2
2
n E n
E n
ij ij
ij
c h
c hall cells
2
2
n E n
E n
ij ij
ij
c h
c hall cells
22 Test of Independence Test of Independence Expected CountsExpected Counts
• 1.1. Statistical Independence Means Statistical Independence Means Joint Probability Equals Product of Joint Probability Equals Product of Marginal ProbabilitiesMarginal Probabilities
• 2.2. Compute Marginal Probabilities Compute Marginal Probabilities & Multiply for Joint Probability& Multiply for Joint Probability
• 3.3. Expected Count Is Sample Size Expected Count Is Sample Size Times Joint ProbabilityTimes Joint Probability
Expected Count ExampleExpected Count Example
LocationUrban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
LocationUrban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
Expected Count ExampleExpected Count Example
Location Urban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
Location Urban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
Expected Count ExampleExpected Count Example
112 112 160160
Marginal probability = Marginal probability =
LocationUrban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
LocationUrban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
Expected Count ExampleExpected Count Example
112 112 160160
78 78 160160
Marginal probability = Marginal probability =
Marginal probability = Marginal probability =
LocationUrban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
LocationUrban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
Expected Count ExampleExpected Count Example
112 112 160160
78 78 160160
Marginal probability = Marginal probability =
Marginal probability = Marginal probability =
Joint probability = Joint probability = 112 112 160160
78 78 160160
LocationUrban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
LocationUrban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
Expected Count ExampleExpected Count Example
112 112 160160
78 78 160160
Marginal probability = Marginal probability =
Marginal probability = Marginal probability =
Joint probability = Joint probability = 112 112 160160
78 78 160160
Expected count = 160· Expected count = 160· 112 112 160160
78 78 160160
= 54.6 = 54.6
Expected Count CalculationExpected Count Calculation
Expected Count CalculationExpected Count Calculation
Expected count = Row total Column total
Sample sizea fa f
Expected count = Row total Column total
Sample sizea fa f
House Location Urban Rural
House Style Obs. Exp. Obs. Exp. Total
Split-Level 63 54.6 49 57.4 112
Ranch 15 23.4 33 24.6 48
Total 78 78 82 82 160
House Location Urban Rural
House Style Obs. Exp. Obs. Exp. Total
Split-Level 63 54.6 49 57.4 112
Ranch 15 23.4 33 24.6 48
Total 78 78 82 82 160
Expected Count CalculationExpected Count Calculation
112·82 112·82 160160
48·78 48·78 160160
48·82 48·82 160160
112·78 112·78 160160
Expected count = Row total Column total
Sample sizea fa f
Expected count = Row total Column total
Sample sizea fa f
Diet PepsiDiet Coke No Yes TotalNo 84 32 116Yes 48 122 170
Total 132 154 286
Diet PepsiDiet Coke No Yes TotalNo 84 32 116Yes 48 122 170
Total 132 154 286
• You’re a marketing research analyst. You You’re a marketing research analyst. You ask a random sample of ask a random sample of 286286 consumers if consumers if they purchase Diet Pepsi or Diet Coke. At they purchase Diet Pepsi or Diet Coke. At the the .05.05 level, is there evidence of a level, is there evidence of a relationshiprelationship??
22 Test of Independence Test of Independence ExampleExample
22 Test of Independence Test of Independence SolutionSolution
22 Test of Independence Test of Independence SolutionSolution
• HH00: :
• HHaa: : = =
• df = df =
• Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
20
Reject
20
Reject
22 Test of Independence Test of Independence SolutionSolution
• HH00: : No No Relationship Relationship
• HHaa: : Relationship Relationship = =
• df = df =
• Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
20
Reject
20
Reject
22 Test of Independence Test of Independence SolutionSolution
• HH00: : No No Relationship Relationship
• HHaa: : Relationship Relationship = = .05.05
• df = df = (2 - 1)(2 - 1) (2 - 1)(2 - 1) = 1 = 1
• Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
20
Reject
20
Reject
22 Test of Independence Test of Independence SolutionSolution
• HH00: : No No Relationship Relationship
• HHaa: : Relationship Relationship = = .05.05
• df = df = (2 - 1)(2 - 1) (2 - 1)(2 - 1) = 1 = 1
• Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
20 3.841
Reject
20 3.841
Reject
= .05= .05
Diet Pepsi No Yes
Diet Coke Obs. Exp. Obs. Exp. Total
No 84 53.5 32 62.5 116
Yes 48 78.5 122 91.5 170
Total 132 132 154 154 286
Diet Pepsi No Yes
Diet Coke Obs. Exp. Obs. Exp. Total
No 84 53.5 32 62.5 116
Yes 48 78.5 122 91.5 170
Total 132 132 154 154 286
EE((nnijij)) 5 in all 5 in all
cellscells
170·132 170·132 286286
170·154 170·154 286286
116·132 116·132 286286
154·1154·11616 286286
22 Test of Independence Test of Independence SolutionSolution
2
2
11 11
2
11
12 12
2
12
22 22
2
22
2 2 284 53 5
53 5
32 62 5
62 5
122 915
91554 29
n E n
E n
n E n
E n
n E n
E n
n E n
E n
ij ij
ij
.
.
.
.
.
..
c hc h
a fa f
a fa f
a fa f
all cells
2
2
11 11
2
11
12 12
2
12
22 22
2
22
2 2 284 53 5
53 5
32 62 5
62 5
122 915
91554 29
n E n
E n
n E n
E n
n E n
E n
n E n
E n
ij ij
ij
.
.
.
.
.
..
c hc h
a fa f
a fa f
a fa f
all cells
22 Test of Independence Test of Independence SolutionSolution
22 Test of Independence Test of Independence SolutionSolution
• HH00: : No No Relationship Relationship
• HHaa: : Relationship Relationship = .05= .05
• dfdf = (2 - 1)(2 - 1) = (2 - 1)(2 - 1) = 1 = 1
• Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
20 3.841
Reject
20 3.841
Reject
= .05= .05
22 = 54.29 = 54.29
22 Test of Independence Test of Independence SolutionSolution
• HH00: : No No Relationship Relationship
• HHaa: : Relationship Relationship = .05= .05
• dfdf = (2 - 1)(2 - 1) = (2 - 1)(2 - 1) = 1 = 1
• Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
Reject at Reject at = .05 = .05
20 3.841
Reject
20 3.841
Reject
= .05= .05
22 = 54.29 = 54.29
22 Test of Independence Test of Independence SolutionSolution
• HH00: : No No Relationship Relationship
• HHaa: : Relationship Relationship = .05= .05
• dfdf = (2 - 1)(2 - 1) = (2 - 1)(2 - 1) = 1 = 1
• Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
Reject at Reject at = .05 = .05
There is evidence of a There is evidence of a relationshiprelationship20 3.841
Reject
20 3.841
Reject
= .05= .05
22 = 54.29 = 54.29
Siskel and EbertSiskel and Ebert• | Ebert
• Siskel | Con Mix Pro | Total
• -----------+---------------------------------+----------
• Con | 24 8 13 | 45
• Mix | 8 13 11 | 32
• Pro | 10 9 64 | 83
• -----------+---------------------------------+----------
• Total | 42 30 88 | 160
Siskel and EbertSiskel and Ebert• | Ebert• Siskel | Con Mix Pro | Total•-----------+---------------------------------+----------• Con | 24 8 13 | 45 • | 11.8 8.4 24.8 | 45.0 •-----------+---------------------------------+----------• Mix | 8 13 11 | 32 • | 8.4 6.0 17.6 | 32.0 •-----------+---------------------------------+----------• Pro | 10 9 64 | 83 • | 21.8 15.6 45.6 | 83.0 •-----------+---------------------------------+----------• Total | 42 30 88 | 160 • | 42.0 30.0 88.0 | 160.0
• Pearson chi2(4) = 45.3569 p < 0.001
Yate’s StatisticsYate’s Statistics
• Method of testing for association for Method of testing for association for 2x2 tables when 2x2 tables when sample size is sample size is moderate ( total observation moderate ( total observation between 6 – 25)between 6 – 25)
ij
i jijij
e
eO
2
2
5.0
End of Chapter
Any blank slides that follow are blank intentionally.
Measures of associationMeasures of association
– Relative Risk Relative Risk – Odds Ratio Odds Ratio – Absolute Risk Absolute Risk
Relative RiskRelative Risk
• Ratio of the probability that the Ratio of the probability that the outcome characteristic is present for outcome characteristic is present for one group, relative to the otherone group, relative to the other
• Sample proportions with characteristic Sample proportions with characteristic from groups 1 and 2: from groups 1 and 2:
.2
212
^
.1
111
^
n
n
n
n
Relative RiskRelative Risk
• Estimated Relative Risk:Estimated Relative Risk:
2
^1
^
RR
95% Confidence Interval for Population Relative Risk:
21
2
^
11
1
^
96.196.1
)1()1(71828.2
))(,)((
nnve
eRReRR vv
Relative RiskRelative Risk
• InterpretationInterpretation– Conclude that the probability that the outcome Conclude that the probability that the outcome
is present is higher (in the population) for group is present is higher (in the population) for group 1 if the entire interval is above 11 if the entire interval is above 1
– Conclude that the probability that the outcome Conclude that the probability that the outcome is present is lower (in the population) for group is present is lower (in the population) for group 1 if the entire interval is below 11 if the entire interval is below 1
– Do not conclude that the probability of the Do not conclude that the probability of the outcome differs for the two groups if the outcome differs for the two groups if the interval contains 1 interval contains 1
Example - Coccidioidomycosis and Example - Coccidioidomycosis and TNFTNF-antagonists-antagonists
• Research Question: Risk of developing Coccidioidmycosis associated with arthritis therapy?
• Groups: Patients receiving tumor necrosis factor (TNF) versus Patients not receiving TNF (all patients arthritic)
COC No COC TotalTNF 7 240 247Other 4 734 738Total 11 974 985
Source: Bergstrom, et al (2004)
Example - Coccidioidomycosis and Example - Coccidioidomycosis and TNFTNF-antagonists-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
)76.17,55.1()24.5,24.5(:%95
3874.4
0054.1
7
0283.124.5
0054.
0283.
0054.738
40283.
247
7
3874.96.13874.96.1
2
^
1
^
2
^
1
^
eeCI
vRR
Entire CI above 1 Conclude higher risk if on TNF
Odds RatioOdds Ratio
• Odds of an event is the probability it occurs Odds of an event is the probability it occurs divided by the probability it does not occurdivided by the probability it does not occur
• Odds ratio is the odds of the event for group 1 Odds ratio is the odds of the event for group 1 divided by the odds of the event for group 2divided by the odds of the event for group 2
• Sample odds of the outcome for each group:Sample odds of the outcome for each group:
22
212
12
11
.112
.1111 /
/
n
nodds
n
n
nn
nnodds
Odds RatioOdds Ratio
• Estimated Odds Ratio:
2112
2211
2221
1211
2
1
/
/
nn
nn
nn
nn
odds
oddsOR
95% Confidence Interval for Population Odds Ratio
22211211
96.196.1
111171828.2
))(,)((
nnnnve
eOReOR vv
Odds RatioOdds Ratio
• InterpretationInterpretation– Conclude that the probability that the outcome Conclude that the probability that the outcome
is present is higher (in the population) for group is present is higher (in the population) for group 1 if the entire interval is above 11 if the entire interval is above 1
– Conclude that the probability that the outcome Conclude that the probability that the outcome is present is lower (in the population) for group is present is lower (in the population) for group 1 if the entire interval is below 11 if the entire interval is below 1
– Do not conclude that the probability of the Do not conclude that the probability of the outcome differs for the two groups if the outcome differs for the two groups if the interval contains 1interval contains 1
Example - NSAIDs and GBMExample - NSAIDs and GBM
• Case-Control Study (Retrospective)Case-Control Study (Retrospective)– Cases: 137 Self-Reporting Patients with Glioblastoma Cases: 137 Self-Reporting Patients with Glioblastoma
Multiforme (GBM)Multiforme (GBM)
– Controls: 401 Population-Based Individuals matched to Controls: 401 Population-Based Individuals matched to cases wrt demographic factorscases wrt demographic factors
GBM Present GBM Absent TotalNSAID User 32 138 170NSAID Non-User 105 263 368Total 137 401 538
Source: Sivak-Sears, et al (2004)
Example - NSAIDs and GBMExample - NSAIDs and GBM
)91.0,37.0()58.0,58.0(:%95
0518.0263
1
105
1
138
1
32
1
58.014490
8416
)105(138
)263(32
0518.096.10518.096.1
eeCI
v
OR
Interval is entirely below 1, NSAID use appears to be lower among cases than controls
Absolute RiskAbsolute Risk
• Difference Between Proportions of outcomes Difference Between Proportions of outcomes with an outcome characteristic for 2 groupswith an outcome characteristic for 2 groups
• Sample proportions with characteristic Sample proportions with characteristic from groups 1 and 2:from groups 1 and 2:
.2
212
^
.1
111
^
n
n
n
n
Absolute RiskAbsolute Risk
2
^
1
^
AR
Estimated Absolute Risk:
95% Confidence Interval for Population Absolute Risk
.2
2
^
2
^
.1
1
^
1
^
1196.1
nnAR
Absolute RiskAbsolute Risk
• InterpretationInterpretation– Conclude that the probability that the outcome Conclude that the probability that the outcome
is present is higher (in the population) for group is present is higher (in the population) for group 1 if the entire interval is positive1 if the entire interval is positive
– Conclude that the probability that the outcome Conclude that the probability that the outcome is present is lower (in the population) for group is present is lower (in the population) for group 1 if the entire interval is negative1 if the entire interval is negative
– Do not conclude that the probability of the Do not conclude that the probability of the outcome differs for the two groups if the outcome differs for the two groups if the interval contains 0interval contains 0
Example - Coccidioidomycosis and Example - Coccidioidomycosis and TNFTNF-antagonists-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
)0242.0,0016.0(0213.0229.738
)9946(.0054.
247
)9717(.0283.96.10229.:%95
0229.0054.0283.
0054.738
40283.
247
7
2
^
1
^
2
^
1
^
CI
AR
Interval is entirely positive, TNF is associated with higher risk
Ordinal Explanatory and Response Ordinal Explanatory and Response VariablesVariables
• Pearson’s Chi-square test can be used to Pearson’s Chi-square test can be used to test associations among ordinal variables, test associations among ordinal variables, but more powerful methods existbut more powerful methods exist
• When theories exist that the association is When theories exist that the association is directional (positive or negative), measures directional (positive or negative), measures exist to describe and test for these specific exist to describe and test for these specific alternatives from independence: alternatives from independence: – GammaGamma
– Kendall’s Kendall’s bb
Concordant and Discordant PairsConcordant and Discordant Pairs
• Concordant Pairs - Pairs of individuals where Concordant Pairs - Pairs of individuals where one individual scores “higher” on both one individual scores “higher” on both ordered variables than the other individualordered variables than the other individual
• Discordant Pairs - Pairs of individuals where Discordant Pairs - Pairs of individuals where one individual scores “higher” on one ordered one individual scores “higher” on one ordered variable and the other individual scores variable and the other individual scores ““lowlower” on the otherer” on the other
• CC = # Concordant Pairs = # Concordant Pairs DD = # Discordant = # Discordant PairsPairs– Under Positive association, expect Under Positive association, expect CC > > DD– Under Negative association, expect Under Negative association, expect CC < < DD– Under No association, expect Under No association, expect C C DD
Example - Alcohol Use and Sick Example - Alcohol Use and Sick DaysDays
• Alcohol Risk (Without Risk, Hardly any Alcohol Risk (Without Risk, Hardly any Risk, Some to Considerable Risk)Risk, Some to Considerable Risk)
• Sick Days (0, 1-6, Sick Days (0, 1-6, 7)7)
• Concordant Pairs - Pairs of respondents Concordant Pairs - Pairs of respondents where one scores higher on both alcohol where one scores higher on both alcohol risk and sick days than the otherrisk and sick days than the other
• Discordant Pairs - Pairs of respondents Discordant Pairs - Pairs of respondents where one scores higher on alcohol risk where one scores higher on alcohol risk and the other scores higher on sick daysand the other scores higher on sick days
Source: Hermansson, et al (2003)
Example - Alcohol Use and Sick Example - Alcohol Use and Sick DaysDays
ALCOHOL * SICKDAYS Crosstabulation
Count
347 113 145 605
154 63 56 273
52 25 34 111
553 201 235 989
Without Risk
Hardly any Risk
Some-Considerable Risk
ALCOHOL
Total
0 days 1-6 days 7+ days
SICKDAYS
Total
• Concordant Pairs: Each individual in a given cell is concordant with each individual in cells “Southeast” of theirs
•Discordant Pairs: Each individual in a given cell is discordant with each individual in cells “Southwest” of theirs
Example - Alcohol Use and Sick Example - Alcohol Use and Sick DaysDays
ALCOHOL * SICKDAYS Crosstabulation
Count
347 113 145 605
154 63 56 273
52 25 34 111
553 201 235 989
Without Risk
Hardly any Risk
Some-Considerable Risk
ALCOHOL
Total
0 days 1-6 days 7+ days
SICKDAYS
Total
73496)52(63)2552(56)52154(113)255263154(145
83164)34(63)3425(154)3456(113)34255663(347
D
C
Measures of AssociationMeasures of Association
• Goodman and Kruskal’s Gamma:
11^^
DC
DC
• Kendall’s b:
))(( 2.
22
.2
^
ji
b
nnnn
DC
When there’s no association between the ordinal variables, the population based values of these measures are 0. Statistical software packages provide these tests.
Example - Alcohol Use and Sick Example - Alcohol Use and Sick DaysDays
0617.07349683164
7349683164^
DC
DC
Symmetric Measures
.035 .030 1.187 .235
.062 .052 1.187 .235
989
Kendall's tau-b
Gamma
Ordinal byOrdinal
N of Valid Cases
ValueAsymp.
Std. Errora
Approx. Tb
Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.