1 measuring association the contents in this chapter are from chapter 19 of the textbook. the...
TRANSCRIPT
1
Measuring Association
The contents in this chapter are from Chapter 19 of the textbook.
The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ SYSTEM OVERALL judgrate: RATE JOB DONE: VT'S JUDGES OVERALL proscrat: RATE JOB DONE: VT'S PROSECUTORS
2
Measures of association
RATE JOB DONE: CJ SYSTEM OVERALL
19 3.2 3.3 3.3257 42.8 44.2 47.4241 40.1 41.4 88.865 10.8 11.2 100.0
582 96.8 100.019 3.2
601 100.0
ExcellentGoodOnly fairPoorTotal
Valid
Not sure/don't knowMissingTotal
Frequency Percent Valid PercentCumulative
Percent
Survey rated the criminal justice system in Vermont.
There is only 3.2 % of the 601 to rate the system to be Excellent.
3
RATE JOB DONE: VT'S JUDGES OVERALL * RATE JOB DONE: CJ SYSTEM OVERALL Crosstabulation
10 20 7 0 37
27.0% 54.1% 18.9% .0% 100.0%
6 172 88 10 276
2.2% 62.3% 31.9% 3.6% 100.0%
3 54 117 21 195
1.5% 27.7% 60.0% 10.8% 100.0%
0 4 17 31 52
.0% 7.7% 32.7% 59.6% 100.0%
19 250 229 62 560
3.4% 44.6% 40.9% 11.1% 100.0%
Count% within RATE JOB DONE:VT'S JUDGES OVERALLCount% within RATE JOB DONE:VT'S JUDGES OVERALLCount% within RATE JOB DONE:VT'S JUDGES OVERALLCount% within RATE JOB DONE:VT'S JUDGES OVERALLCount% within RATE JOB DONE:VT'S JUDGES OVERALL
Excellent
Good
Only fair
Poor
RATE JOB DONE:VT'S JUDGESOVERALL
Total
Excellent Good Only fair PoorRATE JOB DONE: CJ SYSTEM OVERALL
Total
Measures of association
There are 27% of the people who rated judges as Excellent rated the System as Excellent as well.
4
RATE JOB DONE: VT'S PROSECUTORS * RATE JOB DONE: CJ SYSTEM OVERALL Crosstabulation
8 11 3 1 23
34.8% 47.8% 13.0% 4.3% 100.0%
7 192 71 9 279
2.5% 68.8% 25.4% 3.2% 100.0%
1 40 141 24 206
.5% 19.4% 68.4% 11.7% 100.0%
0 4 10 27 41
.0% 9.8% 24.4% 65.9% 100.0%
16 247 225 61 549
2.9% 45.0% 41.0% 11.1% 100.0%
Count% within RATE JOBDONE: VT'S PROSECUTORSCount% within RATE JOBDONE: VT'S PROSECUTORSCount% within RATE JOBDONE: VT'S PROSECUTORSCount% within RATE JOBDONE: VT'S PROSECUTORSCount% within RATE JOBDONE: VT'S PROSECUTORS
Excellent
Good
Only fair
Poor
RATE JOBDONE: VT'SPROSECUTORS
Total
Excellent Good Only fair PoorRATE JOB DONE: CJ SYSTEM OVERALL
Total
Almost 69% of the people who rated prosecutors as Good rated the System as Good as well.
Measures of association
5
Measures of association
The word related can have many different meaning.
A perfect relationship is one in which all people gave the same ratings to the overall system and a particular component.
It is imperfect relationships that van be quantified in many different ways.
We need measures of association. Their range, in general, are in absolute value from 0 to 1.
6
Measures of association-Lambda
Let be the number of misclassified in situation 1, and let the number of misclassified in situation 2.
The measure of association Lambda is defined by
Let us see an example in data crimjust.sav.
1n2n
1
21
n
nn
7
Measures of association-Lambda
For Table on p.3 we consider Situation 1: If we predict Good for everyone, the
misclassified number is 19+229+62=310. Situation 2: Consider the rule
For each category of the independent variable, predict the category of the dependent variable that occurs most frequently. By the use of this rule we have
Excellent: 17=10+7; Good: 104=6+88+10; Only fair: 78=3+54+21; poor: 21=4+17 The total misclassified is 220=17+104+78+21. .290
310
90
310
220310.
8
Measures of association-Lambda
The value of the is from 0 to 1. The case of indicates you make
prediction no errors The case of means that the independent
variable is of no help in prediction. Two different lambdas: it is not a symmetric
measure. Its value depends on which variable you predict from which.
1
0
9
Measures of association-Lambda
Two different lambdas We have calculated the lambda for predicting
judgrate. To calculate the lambda for predicting cjsrate
we have (284-230)/284=0.19. The symmetric lambda is defined by
242.0310284
)220310()230284( Symmetric
10
Directional Measures
.242 .041 5.341 .000
.190 .051 3.370 .001
.290 .039 6.503 .000
.136 .022 .000c
.143 .022 .000c
SymmetricRATE JOB DONE:VT'S JUDGESOVERALL DependentRATE JOB DONE: CJSYSTEM OVERALLDependentRATE JOB DONE:VT'S JUDGESOVERALL DependentRATE JOB DONE: CJSYSTEM OVERALLDependent
Lambda
Goodman and Kruskal tau
Nominal byNominal
ValueAsymp. Std.
Errora Approx. Tb Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Based on chi-square approximationc.
Measures of association-Lambda
11
Measures of association For Ordinal Variable
In the past discussion we did not use the order information: Excellent > Good > Just poor > Poor
If judges’ ratings increase as overall rating increase, you can say that the two variables have a positive relationship. Similarly we can define a negative relationship.
12
Measures of association For Ordinal Variable
Concordant and discordant pairs A pair of cases is discordant if the value of
one variable for a case is larger than the value for the other case but the direction is reversed for the second variable. A pair of cases is called concordant if it is not discordant.
cjsrate judgrat
eCase 1 1 2Case 2 2 3Case 3 3 2
13
Measures of association For Ordinal Variable
Concordant and discordant pairs Let P be the number of concordant pairs and Q
be the number of discordant pairs for all distinct pairs of observations.
The Goodman and Kruskai’s gamma is defined by
Gamma=(P-Q)/(P+Q)Symmetric Measures
.680 .040 12.778 .000560
GammaOrdinal by OrdinalN of Valid Cases
ValueAsymp. Std.
Errora Approx. Tb Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
14
Measures of association For Ordinal Variable
A positive gamma tells you that there are more like (concordant) pairs of cases than unlike pairs.
There is a positive relationship between the two sets of ratings. As judges’ ratings increase, so do ratings of the overall system.
If two variables are independent, the value of gamma is 0. However, a gamma of 0, like a lambda of 0, does not necessarily mean independence.
15
Measures of association For Ordinal Variable
Kendall’s Tau-b A measure that attempts to normalized P-Q by
considering ties on each variable in a pair separately in tau-b.
The tau-b is defined by
Where is the number of ties involving only the first variable and is the number of ties involving only the second variable.
Tau-b can have the value of 1 and -1 only for tables that have the same number of rows and columns.
)TQP()TQP(
QP
YX
b
XT
YT
16
Measures of association For Ordinal Variable
Kendall’s Tau-c A measure that attempts to normalized P-Q is tau-c.
That is defined by
where m is the samller of the number of rows and columns and N is the number of cases
Unfortunately, there is no simple proportional reduction of error interpretation of tau-c either.
)1m(N
)QP(2m2c
17
Symmetric Measures
.462 .033 12.778 .000
.383 .030 12.778 .000560
Kendall's tau-bKendall's tau-c
Ordinal byOrdinal
N of Valid Cases
ValueAsymp. Std.
Errora Approx. Tb Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Measures of association For Ordinal Variable
The following results of tau-b and tau-c between cjsrate and judgrate.
There is no simple interpretation for the values. The tau-b is a commonly used measure of
association.
18
Measures of association For Ordinal Variable
There are more measures of association Somers’d on p.428 The Cohen’s kappa on pp. 429-430. How can you decide what measure of association
to use? No single measure of association is best for all situations.
19
Directional Measures
.462 .033 12.778 .000
.464 .034 12.778 .000
.461 .033 12.778 .000
SymmetricRATE JOB DONE:VT'S JUDGESOVERALL DependentRATE JOB DONE: CJSYSTEM OVERALLDependent
Somers' dOrdinal by OrdinalValue
Asymp. Std.Errora Approx. Tb Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Correlation-based Measures
20
RATE JOB DONE: VT'S JUDGES OVERALL * RATE JOB DONE: VT'S PROSECUTORS Crosstabulation
13 17 4 1 352.4% 3.1% .7% .2% 6.4%
5 192 63 6 266.9% 35.4% 11.6% 1.1% 49.0%
3 59 117 12 191.6% 10.9% 21.5% 2.2% 35.2%
2 8 20 21 51.4% 1.5% 3.7% 3.9% 9.4%23 276 204 40 543
4.2% 50.8% 37.6% 7.4% 100.0%
Count% of TotalCount% of TotalCount% of TotalCount% of TotalCount% of Total
Excellent
Good
Only fair
Poor
RATE JOB DONE:VT'S JUDGESOVERALL
Total
Excellent Good Only fair PoorRATE JOB DONE: VT'S PROSECUTORS
Total
Symmetric Measures
.395 .033 12.634 .000543
KappaMeasure of AgreementN of Valid Cases
ValueAsymp. Std.
Errora Approx. Tb Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
21
Correlation-based Measures
Coefficient of correlation When two variables are numerical the Pearson
correlation coefficient has been widely used. It measures the strength of the linear relationship
between two numerical variables.
YX ss
YXr
),cov(
1
)(
1
)(
1
))((),cov( where
1
2
1
2
1
n
YYs
n
XXs
n
YYXXYX
n
ii
Y
n
ii
X
n
iii
22
Correlation-based Measures
Definition of Spearman rank correlation
23
An example of rank correlation
sr
24
Symmetric Measures
.497 .042 13.330 .000c
.492 .039 13.128 .000c
543
Pearson's RInterval by IntervalSpearman CorrelationOrdinal by Ordinal
N of Valid Cases
ValueAsymp. Std.
Errora Approx. Tb Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Based on normal approximation.c.
Correlation-based Measures
Spearman correlation coefficient is a nonparametric measure of association.
25
Measure based on the chi-square statistic
Since the chi-square test of independence is often used when analyzing crosstabulations, there are a variety of measure of association that are based on the chi-square statistic.
The chi-square statistic is not a good measure of association. Several modifications have been proposed.
26
Contingency Table Example
Left-Handed vs. Gender
Dominant Hand: Left vs. Right
Gender: Male vs. Female
2 categories for each variable, so called a 2 x 2 table
Suppose we examine a sample of size 300
27
Contingency Table Example
Sample results organized in a contingency table:
(continued)
Gender
Hand Preference
Left Right
Female 12 108 120
Male 24 156 180
36 264 300
120 Females, 12 were left handed
180 Males, 24 were left handed
sample size = n = 300:
28
2 Test for the Difference Between Two Proportions
If H0 is true, then the proportion of left-handed females should be the same as the proportion of left-handed males!
The two proportions above should be the same as the proportion of left-handed people overall
H0: π1 = π2 (Proportion of females who are left handed is equal to the proportion of
males who are left handed) H1: π1 ≠ π2 (The two proportions are not the
same Hand preference is not independent of gender)
29
The Chi-Square Test Statistic
where:fo = observed frequency in a particular cellfe = expected frequency in a particular cell if H0 is true
2 for the K x L case has (K-1)(L-1) degree of freedom
(Assumed: each cell in the contingency table has expected frequency of at least 5)
cells all e
2eo2
f
)ff(
The Chi-square test statistic is:
30
Decision Rule
2U
Decision Rule:If 2 > 2
U, reject H0, otherwise, do not reject H0
The 2 test statistic approximately follows a chi-squared distribution with one degree of freedom
0
Reject H0Do not reject H0
http://www.statsoft.com/textbook/stathome.html?sttable.html&1
31
Computing the Average Proportion
Here: 120 Females, 12 were left handed
180 Males, 24 were left handed
i.e., the proportion of left handers overall is 0.12, that is, 12%
n
X
nn
XXp
21
21
12.0300
36
180120
2412p
The average proportion is:
32
Finding Expected Frequencies
To obtain the expected frequency for left handed females, multiply the average proportion left handed (p) by the total number of females
To obtain the expected frequency for left handed males, multiply the average proportion left handed (p) by the total number of males
If the two proportions are equal, then
P(Left Handed | Female) = P(Left Handed | Male) = .12
i.e., we would expect (.12)(120) = 14.4 females to be left handed(.12)(180) = 21.6 males to be left handed
33
Observed vs. Expected Frequencies
Gender
Hand Preference
Left Right
FemaleObserved = 12
Expected = 14.4
Observed = 108Expected =
105.6120
MaleObserved = 24
Expected = 21.6
Observed = 156Expected =
158.4180
36 264 300
34
Gender
Hand Preference
Left Right
FemaleObserved = 12
Expected = 14.4Observed = 108
Expected = 105.6120
MaleObserved = 24
Expected = 21.6Observed = 156
Expected = 158.4180
36 264 300
0.7576158.4
158.4)(156
21.6
21.6)(24
105.6
105.6)(108
14.4
14.4)(12
f
)f(fχ
2222
cells all e
2eo2
The Chi-Square Test Statistic
The test statistic is:
35
Decision Rule
Decision Rule:If 2 > 3.841, reject H0, otherwise, do not reject H0
3.841 d.f. 1 with , 0.7576 isstatistic test The 2U
2 χχ
Here, 2 = 0.7576 < 2
U = 3.841, so we do not reject H0 and conclude that there is not sufficient evidence that the two proportions are different at = 0.05
2U=3.841
0
Reject H0Do not reject H0
36
Measure based on the chi-square statistic
Since the chi-square test of independence is often used when analyzing crosstabulations, there are a variety of measure of association that are based on the chi-square statistic.
The chi-square statistic is not a good measure of association. Several modifications have been proposed.