contingency tables 1.explain 2 test of independence 2.measure of association

Contingency TablesContingency Tables

11.. Explain Explain 22 Test of Independence Test of Independence

22.. Measure of AssociationMeasure of Association

Contingency TablesContingency Tables

• Tables representing all combinations Tables representing all combinations of levels of explanatory and response of levels of explanatory and response variablesvariables

• Numbers in table represent Numbers in table represent CountsCounts of the number of cases in each cellof the number of cases in each cell

• Row and column totals are called Row and column totals are called Marginal countsMarginal counts

2x2 Tables2x2 Tables

• Each variable has 2 levelsEach variable has 2 levels– Explanatory Variable – Groups (Typically Explanatory Variable – Groups (Typically

based on demographics, exposure) based on demographics, exposure) – Response Variable – Outcome (Typically Response Variable – Outcome (Typically

presence or absence of a characteristic)presence or absence of a characteristic)

2x2 Tables - Notation2x2 Tables - Notation

OutcomePresent

OutcomeAbsent

GroupTotal

Group 1 n11 n12 n1.

Group 2 n21 n22 n2.

OutcomeTotal

n.1 n.2 n..

22 Test of Independence Test of Independence

22 Test of Independence Test of Independence

• 1.1. Shows If a Relationship Exists Shows If a Relationship Exists Between 2 Qualitative VariablesBetween 2 Qualitative Variables– One Sample Is DrawnOne Sample Is Drawn– Does Does NotNot Show Causality Show Causality

• 2.2. AssumptionsAssumptions– Multinomial ExperimentMultinomial Experiment– All Expected Counts All Expected Counts 5 5

• 3.3. Uses Two-Way Contingency TableUses Two-Way Contingency Table

22 Test of Independence Test of Independence Contingency Table Contingency Table

• 1.1. Shows # Observations From 1 Shows # Observations From 1 Sample Jointly in 2 Qualitative Sample Jointly in 2 Qualitative VariablesVariables

House Location House Style Urban Rural Total Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160

House Location House Style Urban Rural Total Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160

22 Test of Independence Test of Independence Contingency Table Contingency Table

• 1.1. Shows # Observations From 1 Shows # Observations From 1 Sample Jointly in 2 Qualitative Sample Jointly in 2 Qualitative VariablesVariables Levels of variable 2Levels of variable 2

Levels of variable 1Levels of variable 1

22 Test of Independence Test of Independence Hypotheses & StatisticHypotheses & Statistic

• 1.1. HypothesesHypotheses– HH00: Variables Are Independent : Variables Are Independent

– HHaa: Variables Are Related (Dependent): Variables Are Related (Dependent)




• 2.2. Test StatisticTest StatisticObserved countObserved count

Expected Expected countcount 2

2

n E n

E n

ij ij

ij

c h

c hall cells

2

2

n E n

E n

ij ij

ij

c h

c hall cells




• 2.2. Test StatisticTest Statistic

• Degrees of Freedom: (Degrees of Freedom: (rr - 1)( - 1)(cc - 1) - 1)RowsRows Columns Columns

Observed countObserved count

Expected Expected countcount 2

2

n E n

E n

ij ij

ij

c h

c hall cells

2

2

n E n

E n

ij ij

ij

c h

c hall cells

22 Test of Independence Test of Independence Expected CountsExpected Counts

• 1.1. Statistical Independence Means Statistical Independence Means Joint Probability Equals Product of Joint Probability Equals Product of Marginal ProbabilitiesMarginal Probabilities

• 2.2. Compute Marginal Probabilities Compute Marginal Probabilities & Multiply for Joint Probability& Multiply for Joint Probability

• 3.3. Expected Count Is Sample Size Expected Count Is Sample Size Times Joint ProbabilityTimes Joint Probability

Expected Count ExampleExpected Count Example

LocationUrban Rural

House Style Obs. Obs. Total

Split-Level 63 49 112

Ranch 15 33 48

Total 78 82 160

LocationUrban Rural



Ranch 15 33 48

Total 78 82 160


Location Urban Rural



Ranch 15 33 48

Total 78 82 160

Location Urban Rural



Ranch 15 33 48

Total 78 82 160


112 112 160160

Marginal probability = Marginal probability =

LocationUrban Rural



Ranch 15 33 48

Total 78 82 160

LocationUrban Rural



Ranch 15 33 48

Total 78 82 160


112 112 160160

78 78 160160



LocationUrban Rural



Ranch 15 33 48

Total 78 82 160

LocationUrban Rural



Ranch 15 33 48

Total 78 82 160


112 112 160160

78 78 160160



Joint probability = Joint probability = 112 112 160160

78 78 160160

LocationUrban Rural



Ranch 15 33 48

Total 78 82 160

LocationUrban Rural



Ranch 15 33 48

Total 78 82 160


112 112 160160

78 78 160160



Joint probability = Joint probability = 112 112 160160

78 78 160160

Expected count = 160· Expected count = 160· 112 112 160160

78 78 160160

= 54.6 = 54.6

Expected Count CalculationExpected Count Calculation


Expected count = Row total Column total

Sample sizea fa f


Sample sizea fa f

House Location Urban Rural

House Style Obs. Exp. Obs. Exp. Total

Split-Level 63 54.6 49 57.4 112

Ranch 15 23.4 33 24.6 48

Total 78 78 82 82 160

House Location Urban Rural

House Style Obs. Exp. Obs. Exp. Total

Split-Level 63 54.6 49 57.4 112

Ranch 15 23.4 33 24.6 48

Total 78 78 82 82 160


112·82 112·82 160160

48·78 48·78 160160

48·82 48·82 160160

112·78 112·78 160160


Sample sizea fa f


Sample sizea fa f

Diet PepsiDiet Coke No Yes TotalNo 84 32 116Yes 48 122 170

Total 132 154 286

Diet PepsiDiet Coke No Yes TotalNo 84 32 116Yes 48 122 170

Total 132 154 286

• You’re a marketing research analyst. You You’re a marketing research analyst. You ask a random sample of ask a random sample of 286286 consumers if consumers if they purchase Diet Pepsi or Diet Coke. At they purchase Diet Pepsi or Diet Coke. At the the .05.05 level, is there evidence of a level, is there evidence of a relationshiprelationship??

22 Test of Independence Test of Independence ExampleExample

22 Test of Independence Test of Independence SolutionSolution


• HH00: :

• HHaa: : = =

• df = df =

• Critical Value(s):Critical Value(s):

Test Statistic: Test Statistic:

Decision:Decision:

Conclusion:Conclusion:

20

Reject

20

Reject


• HH00: : No No Relationship Relationship

• HHaa: : Relationship Relationship = =

• df = df =



Decision:Decision:


20

Reject

20

Reject



• HHaa: : Relationship Relationship = = .05.05

• df = df = (2 - 1)(2 - 1) (2 - 1)(2 - 1) = 1 = 1



Decision:Decision:


20

Reject

20

Reject



• HHaa: : Relationship Relationship = = .05.05

• df = df = (2 - 1)(2 - 1) (2 - 1)(2 - 1) = 1 = 1



Decision:Decision:


20 3.841

Reject

20 3.841

Reject

= .05= .05

Diet Pepsi No Yes

Diet Coke Obs. Exp. Obs. Exp. Total

No 84 53.5 32 62.5 116

Yes 48 78.5 122 91.5 170

Total 132 132 154 154 286

Diet Pepsi No Yes

Diet Coke Obs. Exp. Obs. Exp. Total

No 84 53.5 32 62.5 116

Yes 48 78.5 122 91.5 170

Total 132 132 154 154 286

EE((nnijij)) 5 in all 5 in all

cellscells

170·132 170·132 286286

170·154 170·154 286286

116·132 116·132 286286

154·1154·11616 286286


2

2

11 11

2

11

12 12

2

12

22 22

2

22

2 2 284 53 5

53 5

32 62 5

62 5

122 915

91554 29

n E n

E n

n E n

E n

n E n

E n

n E n

E n

ij ij

ij

.

.

.

.

.

..

c hc h

a fa f

a fa f

a fa f

all cells

2

2

11 11

2

11

12 12

2

12

22 22

2

22

2 2 284 53 5

53 5

32 62 5

62 5

122 915

91554 29

n E n

E n

n E n

E n

n E n

E n

n E n

E n

ij ij

ij

.

.

.

.

.

..

c hc h

a fa f

a fa f

a fa f

all cells




• HHaa: : Relationship Relationship = .05= .05

• dfdf = (2 - 1)(2 - 1) = (2 - 1)(2 - 1) = 1 = 1



Decision:Decision:


20 3.841

Reject

20 3.841

Reject

= .05= .05

22 = 54.29 = 54.29




• dfdf = (2 - 1)(2 - 1) = (2 - 1)(2 - 1) = 1 = 1



Decision:Decision:


Reject at Reject at = .05 = .05

20 3.841

Reject

20 3.841

Reject

= .05= .05

22 = 54.29 = 54.29




• dfdf = (2 - 1)(2 - 1) = (2 - 1)(2 - 1) = 1 = 1



Decision:Decision:


Reject at Reject at = .05 = .05

There is evidence of a There is evidence of a relationshiprelationship20 3.841

Reject

20 3.841

Reject

= .05= .05

22 = 54.29 = 54.29

Siskel and EbertSiskel and Ebert• | Ebert

• Siskel | Con Mix Pro | Total

• -----------+---------------------------------+----------

• Con | 24 8 13 | 45

• Mix | 8 13 11 | 32

• Pro | 10 9 64 | 83

• -----------+---------------------------------+----------

• Total | 42 30 88 | 160

Siskel and EbertSiskel and Ebert• | Ebert• Siskel | Con Mix Pro | Total•-----------+---------------------------------+----------• Con | 24 8 13 | 45 • | 11.8 8.4 24.8 | 45.0 •-----------+---------------------------------+----------• Mix | 8 13 11 | 32 • | 8.4 6.0 17.6 | 32.0 •-----------+---------------------------------+----------• Pro | 10 9 64 | 83 • | 21.8 15.6 45.6 | 83.0 •-----------+---------------------------------+----------• Total | 42 30 88 | 160 • | 42.0 30.0 88.0 | 160.0

• Pearson chi2(4) = 45.3569 p < 0.001

Yate’s StatisticsYate’s Statistics

• Method of testing for association for Method of testing for association for 2x2 tables when 2x2 tables when sample size is sample size is moderate ( total observation moderate ( total observation between 6 – 25)between 6 – 25)

ij

i jijij

e

eO

2

2

5.0

End of Chapter

Any blank slides that follow are blank intentionally.

Measures of associationMeasures of association

– Relative Risk Relative Risk – Odds Ratio Odds Ratio – Absolute Risk Absolute Risk

Relative RiskRelative Risk

• Ratio of the probability that the Ratio of the probability that the outcome characteristic is present for outcome characteristic is present for one group, relative to the otherone group, relative to the other

• Sample proportions with characteristic Sample proportions with characteristic from groups 1 and 2: from groups 1 and 2:

.2

212

^

.1

111

^

n

n

n

n


• Estimated Relative Risk:Estimated Relative Risk:

2

^1

^

RR

95% Confidence Interval for Population Relative Risk:

21

2

^

11

1

^

96.196.1

)1()1(71828.2

))(,)((

nnve

eRReRR vv


• InterpretationInterpretation– Conclude that the probability that the outcome Conclude that the probability that the outcome

is present is higher (in the population) for group is present is higher (in the population) for group 1 if the entire interval is above 11 if the entire interval is above 1

– Conclude that the probability that the outcome Conclude that the probability that the outcome is present is lower (in the population) for group is present is lower (in the population) for group 1 if the entire interval is below 11 if the entire interval is below 1

– Do not conclude that the probability of the Do not conclude that the probability of the outcome differs for the two groups if the outcome differs for the two groups if the interval contains 1 interval contains 1

Example - Coccidioidomycosis and Example - Coccidioidomycosis and TNFTNF-antagonists-antagonists

• Research Question: Risk of developing Coccidioidmycosis associated with arthritis therapy?

• Groups: Patients receiving tumor necrosis factor (TNF) versus Patients not receiving TNF (all patients arthritic)

COC No COC TotalTNF 7 240 247Other 4 734 738Total 11 974 985

Source: Bergstrom, et al (2004)


• Group 1: Patients on TNF

• Group 2: Patients not on TNF

)76.17,55.1()24.5,24.5(:%95

3874.4

0054.1

7

0283.124.5

0054.

0283.

0054.738

40283.

247

7

3874.96.13874.96.1

2

^

1

^

2

^

1

^

eeCI

vRR

Entire CI above 1 Conclude higher risk if on TNF

Odds RatioOdds Ratio

• Odds of an event is the probability it occurs Odds of an event is the probability it occurs divided by the probability it does not occurdivided by the probability it does not occur

• Odds ratio is the odds of the event for group 1 Odds ratio is the odds of the event for group 1 divided by the odds of the event for group 2divided by the odds of the event for group 2

• Sample odds of the outcome for each group:Sample odds of the outcome for each group:

22

212

12

11

.112

.1111 /

/

n

nodds

n

n

nn

nnodds


• Estimated Odds Ratio:

2112

2211

2221

1211

2

1

/

/

nn

nn

nn

nn

odds

oddsOR

95% Confidence Interval for Population Odds Ratio

22211211

96.196.1

111171828.2

))(,)((

nnnnve

eOReOR vv



is present is higher (in the population) for group is present is higher (in the population) for group 1 if the entire interval is above 11 if the entire interval is above 1

– Conclude that the probability that the outcome Conclude that the probability that the outcome is present is lower (in the population) for group is present is lower (in the population) for group 1 if the entire interval is below 11 if the entire interval is below 1

– Do not conclude that the probability of the Do not conclude that the probability of the outcome differs for the two groups if the outcome differs for the two groups if the interval contains 1interval contains 1

Example - NSAIDs and GBMExample - NSAIDs and GBM

• Case-Control Study (Retrospective)Case-Control Study (Retrospective)– Cases: 137 Self-Reporting Patients with Glioblastoma Cases: 137 Self-Reporting Patients with Glioblastoma

Multiforme (GBM)Multiforme (GBM)

– Controls: 401 Population-Based Individuals matched to Controls: 401 Population-Based Individuals matched to cases wrt demographic factorscases wrt demographic factors

GBM Present GBM Absent TotalNSAID User 32 138 170NSAID Non-User 105 263 368Total 137 401 538

Source: Sivak-Sears, et al (2004)

Example - NSAIDs and GBMExample - NSAIDs and GBM

)91.0,37.0()58.0,58.0(:%95

0518.0263

1

105

1

138

1

32

1

58.014490

8416

)105(138

)263(32

0518.096.10518.096.1

eeCI

v

OR

Interval is entirely below 1, NSAID use appears to be lower among cases than controls

Absolute RiskAbsolute Risk

• Difference Between Proportions of outcomes Difference Between Proportions of outcomes with an outcome characteristic for 2 groupswith an outcome characteristic for 2 groups

• Sample proportions with characteristic Sample proportions with characteristic from groups 1 and 2:from groups 1 and 2:

.2

212

^

.1

111

^

n

n

n

n


2

^

1

^

AR

Estimated Absolute Risk:

95% Confidence Interval for Population Absolute Risk

.2

2

^

2

^

.1

1

^

1

^

1196.1

nnAR



is present is higher (in the population) for group is present is higher (in the population) for group 1 if the entire interval is positive1 if the entire interval is positive

– Conclude that the probability that the outcome Conclude that the probability that the outcome is present is lower (in the population) for group is present is lower (in the population) for group 1 if the entire interval is negative1 if the entire interval is negative

– Do not conclude that the probability of the Do not conclude that the probability of the outcome differs for the two groups if the outcome differs for the two groups if the interval contains 0interval contains 0


• Group 1: Patients on TNF

• Group 2: Patients not on TNF

)0242.0,0016.0(0213.0229.738

)9946(.0054.

247

)9717(.0283.96.10229.:%95

0229.0054.0283.

0054.738

40283.

247

7

2

^

1

^

2

^

1

^

CI

AR

Interval is entirely positive, TNF is associated with higher risk

Ordinal Explanatory and Response Ordinal Explanatory and Response VariablesVariables

• Pearson’s Chi-square test can be used to Pearson’s Chi-square test can be used to test associations among ordinal variables, test associations among ordinal variables, but more powerful methods existbut more powerful methods exist

• When theories exist that the association is When theories exist that the association is directional (positive or negative), measures directional (positive or negative), measures exist to describe and test for these specific exist to describe and test for these specific alternatives from independence: alternatives from independence: – GammaGamma

– Kendall’s Kendall’s bb

Concordant and Discordant PairsConcordant and Discordant Pairs

• Concordant Pairs - Pairs of individuals where Concordant Pairs - Pairs of individuals where one individual scores “higher” on both one individual scores “higher” on both ordered variables than the other individualordered variables than the other individual

• Discordant Pairs - Pairs of individuals where Discordant Pairs - Pairs of individuals where one individual scores “higher” on one ordered one individual scores “higher” on one ordered variable and the other individual scores variable and the other individual scores ““lowlower” on the otherer” on the other

• CC = # Concordant Pairs = # Concordant Pairs DD = # Discordant = # Discordant PairsPairs– Under Positive association, expect Under Positive association, expect CC > > DD– Under Negative association, expect Under Negative association, expect CC < < DD– Under No association, expect Under No association, expect C C DD

Example - Alcohol Use and Sick Example - Alcohol Use and Sick DaysDays

• Alcohol Risk (Without Risk, Hardly any Alcohol Risk (Without Risk, Hardly any Risk, Some to Considerable Risk)Risk, Some to Considerable Risk)

• Sick Days (0, 1-6, Sick Days (0, 1-6, 7)7)

• Concordant Pairs - Pairs of respondents Concordant Pairs - Pairs of respondents where one scores higher on both alcohol where one scores higher on both alcohol risk and sick days than the otherrisk and sick days than the other

• Discordant Pairs - Pairs of respondents Discordant Pairs - Pairs of respondents where one scores higher on alcohol risk where one scores higher on alcohol risk and the other scores higher on sick daysand the other scores higher on sick days

Source: Hermansson, et al (2003)


ALCOHOL * SICKDAYS Crosstabulation

Count

347 113 145 605

154 63 56 273

52 25 34 111

553 201 235 989

Without Risk

Hardly any Risk

Some-Considerable Risk

ALCOHOL

Total

0 days 1-6 days 7+ days

SICKDAYS

Total

• Concordant Pairs: Each individual in a given cell is concordant with each individual in cells “Southeast” of theirs

•Discordant Pairs: Each individual in a given cell is discordant with each individual in cells “Southwest” of theirs


ALCOHOL * SICKDAYS Crosstabulation

Count

347 113 145 605

154 63 56 273

52 25 34 111

553 201 235 989

Without Risk

Hardly any Risk

Some-Considerable Risk

ALCOHOL

Total

0 days 1-6 days 7+ days

SICKDAYS

Total

73496)52(63)2552(56)52154(113)255263154(145

83164)34(63)3425(154)3456(113)34255663(347

D

C

Measures of AssociationMeasures of Association

• Goodman and Kruskal’s Gamma:

11^^

DC

DC

• Kendall’s b:

))(( 2.

22

.2

^

ji

b

nnnn

DC

When there’s no association between the ordinal variables, the population based values of these measures are 0. Statistical software packages provide these tests.


0617.07349683164

7349683164^

DC

DC

Symmetric Measures

.035 .030 1.187 .235

.062 .052 1.187 .235

989

Kendall's tau-b

Gamma

Ordinal byOrdinal

N of Valid Cases

ValueAsymp.

Std. Errora

Approx. Tb

Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

contingency tables 1.explain 2 test of independence 2.measure of association

Documents

marginal probability

statistical independence

compute marginal probabilities

independent ha

joint probability3

random sample

contingency tables1

combinations of levels