1 measuring association the contents in this chapter are from chapter 19 of the textbook. the...

36
1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ SYSTEM OVERALL judgrate: RATE JOB DONE: VT'S JUDGES OVERALL proscrat: RATE JOB DONE: VT'S PROSECUTORS

Upload: gabriella-kelly

Post on 12-Jan-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

1

Measuring Association

The contents in this chapter are from Chapter 19 of the textbook.

The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ SYSTEM OVERALL judgrate: RATE JOB DONE: VT'S JUDGES OVERALL proscrat: RATE JOB DONE: VT'S PROSECUTORS

Page 2: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

2

Measures of association

RATE JOB DONE: CJ SYSTEM OVERALL

19 3.2 3.3 3.3257 42.8 44.2 47.4241 40.1 41.4 88.865 10.8 11.2 100.0

582 96.8 100.019 3.2

601 100.0

ExcellentGoodOnly fairPoorTotal

Valid

Not sure/don't knowMissingTotal

Frequency Percent Valid PercentCumulative

Percent

Survey rated the criminal justice system in Vermont.

There is only 3.2 % of the 601 to rate the system to be Excellent.

Page 3: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

3

RATE JOB DONE: VT'S JUDGES OVERALL * RATE JOB DONE: CJ SYSTEM OVERALL Crosstabulation

10 20 7 0 37

27.0% 54.1% 18.9% .0% 100.0%

6 172 88 10 276

2.2% 62.3% 31.9% 3.6% 100.0%

3 54 117 21 195

1.5% 27.7% 60.0% 10.8% 100.0%

0 4 17 31 52

.0% 7.7% 32.7% 59.6% 100.0%

19 250 229 62 560

3.4% 44.6% 40.9% 11.1% 100.0%

Count% within RATE JOB DONE:VT'S JUDGES OVERALLCount% within RATE JOB DONE:VT'S JUDGES OVERALLCount% within RATE JOB DONE:VT'S JUDGES OVERALLCount% within RATE JOB DONE:VT'S JUDGES OVERALLCount% within RATE JOB DONE:VT'S JUDGES OVERALL

Excellent

Good

Only fair

Poor

RATE JOB DONE:VT'S JUDGESOVERALL

Total

Excellent Good Only fair PoorRATE JOB DONE: CJ SYSTEM OVERALL

Total

Measures of association

There are 27% of the people who rated judges as Excellent rated the System as Excellent as well.

Page 4: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

4

RATE JOB DONE: VT'S PROSECUTORS * RATE JOB DONE: CJ SYSTEM OVERALL Crosstabulation

8 11 3 1 23

34.8% 47.8% 13.0% 4.3% 100.0%

7 192 71 9 279

2.5% 68.8% 25.4% 3.2% 100.0%

1 40 141 24 206

.5% 19.4% 68.4% 11.7% 100.0%

0 4 10 27 41

.0% 9.8% 24.4% 65.9% 100.0%

16 247 225 61 549

2.9% 45.0% 41.0% 11.1% 100.0%

Count% within RATE JOBDONE: VT'S PROSECUTORSCount% within RATE JOBDONE: VT'S PROSECUTORSCount% within RATE JOBDONE: VT'S PROSECUTORSCount% within RATE JOBDONE: VT'S PROSECUTORSCount% within RATE JOBDONE: VT'S PROSECUTORS

Excellent

Good

Only fair

Poor

RATE JOBDONE: VT'SPROSECUTORS

Total

Excellent Good Only fair PoorRATE JOB DONE: CJ SYSTEM OVERALL

Total

Almost 69% of the people who rated prosecutors as Good rated the System as Good as well.

Measures of association

Page 5: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

5

Measures of association

The word related can have many different meaning.

A perfect relationship is one in which all people gave the same ratings to the overall system and a particular component.

It is imperfect relationships that van be quantified in many different ways.

We need measures of association. Their range, in general, are in absolute value from 0 to 1.

Page 6: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

6

Measures of association-Lambda

Let be the number of misclassified in situation 1, and let the number of misclassified in situation 2.

The measure of association Lambda is defined by

Let us see an example in data crimjust.sav.

1n2n

1

21

n

nn

Page 7: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

7

Measures of association-Lambda

For Table on p.3 we consider Situation 1: If we predict Good for everyone, the

misclassified number is 19+229+62=310. Situation 2: Consider the rule

For each category of the independent variable, predict the category of the dependent variable that occurs most frequently. By the use of this rule we have

Excellent: 17=10+7; Good: 104=6+88+10; Only fair: 78=3+54+21; poor: 21=4+17 The total misclassified is 220=17+104+78+21. .290

310

90

310

220310.

Page 8: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

8

Measures of association-Lambda

The value of the is from 0 to 1. The case of indicates you make

prediction no errors The case of means that the independent

variable is of no help in prediction. Two different lambdas: it is not a symmetric

measure. Its value depends on which variable you predict from which.

1

0

Page 9: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

9

Measures of association-Lambda

Two different lambdas We have calculated the lambda for predicting

judgrate. To calculate the lambda for predicting cjsrate

we have (284-230)/284=0.19. The symmetric lambda is defined by

242.0310284

)220310()230284( Symmetric

Page 10: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

10

Directional Measures

.242 .041 5.341 .000

.190 .051 3.370 .001

.290 .039 6.503 .000

.136 .022 .000c

.143 .022 .000c

SymmetricRATE JOB DONE:VT'S JUDGESOVERALL DependentRATE JOB DONE: CJSYSTEM OVERALLDependentRATE JOB DONE:VT'S JUDGESOVERALL DependentRATE JOB DONE: CJSYSTEM OVERALLDependent

Lambda

Goodman and Kruskal tau

Nominal byNominal

ValueAsymp. Std.

Errora Approx. Tb Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Based on chi-square approximationc.

Measures of association-Lambda

Page 11: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

11

Measures of association For Ordinal Variable

In the past discussion we did not use the order information: Excellent > Good > Just poor > Poor

If judges’ ratings increase as overall rating increase, you can say that the two variables have a positive relationship. Similarly we can define a negative relationship.

Page 12: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

12

Measures of association For Ordinal Variable

Concordant and discordant pairs A pair of cases is discordant if the value of

one variable for a case is larger than the value for the other case but the direction is reversed for the second variable. A pair of cases is called concordant if it is not discordant.

cjsrate judgrat

eCase 1 1 2Case 2 2 3Case 3 3 2

Page 13: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

13

Measures of association For Ordinal Variable

Concordant and discordant pairs Let P be the number of concordant pairs and Q

be the number of discordant pairs for all distinct pairs of observations.

The Goodman and Kruskai’s gamma is defined by

Gamma=(P-Q)/(P+Q)Symmetric Measures

.680 .040 12.778 .000560

GammaOrdinal by OrdinalN of Valid Cases

ValueAsymp. Std.

Errora Approx. Tb Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Page 14: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

14

Measures of association For Ordinal Variable

A positive gamma tells you that there are more like (concordant) pairs of cases than unlike pairs.

There is a positive relationship between the two sets of ratings. As judges’ ratings increase, so do ratings of the overall system.

If two variables are independent, the value of gamma is 0. However, a gamma of 0, like a lambda of 0, does not necessarily mean independence.

Page 15: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

15

Measures of association For Ordinal Variable

Kendall’s Tau-b A measure that attempts to normalized P-Q by

considering ties on each variable in a pair separately in tau-b.

The tau-b is defined by

Where is the number of ties involving only the first variable and is the number of ties involving only the second variable.

Tau-b can have the value of 1 and -1 only for tables that have the same number of rows and columns.

)TQP()TQP(

QP

YX

b

XT

YT

Page 16: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

16

Measures of association For Ordinal Variable

Kendall’s Tau-c A measure that attempts to normalized P-Q is tau-c.

That is defined by

where m is the samller of the number of rows and columns and N is the number of cases

Unfortunately, there is no simple proportional reduction of error interpretation of tau-c either.

)1m(N

)QP(2m2c

Page 17: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

17

Symmetric Measures

.462 .033 12.778 .000

.383 .030 12.778 .000560

Kendall's tau-bKendall's tau-c

Ordinal byOrdinal

N of Valid Cases

ValueAsymp. Std.

Errora Approx. Tb Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Measures of association For Ordinal Variable

The following results of tau-b and tau-c between cjsrate and judgrate.

There is no simple interpretation for the values. The tau-b is a commonly used measure of

association.

Page 18: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

18

Measures of association For Ordinal Variable

There are more measures of association Somers’d on p.428 The Cohen’s kappa on pp. 429-430. How can you decide what measure of association

to use? No single measure of association is best for all situations.

Page 19: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

19

Directional Measures

.462 .033 12.778 .000

.464 .034 12.778 .000

.461 .033 12.778 .000

SymmetricRATE JOB DONE:VT'S JUDGESOVERALL DependentRATE JOB DONE: CJSYSTEM OVERALLDependent

Somers' dOrdinal by OrdinalValue

Asymp. Std.Errora Approx. Tb Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Correlation-based Measures

Page 20: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

20

RATE JOB DONE: VT'S JUDGES OVERALL * RATE JOB DONE: VT'S PROSECUTORS Crosstabulation

13 17 4 1 352.4% 3.1% .7% .2% 6.4%

5 192 63 6 266.9% 35.4% 11.6% 1.1% 49.0%

3 59 117 12 191.6% 10.9% 21.5% 2.2% 35.2%

2 8 20 21 51.4% 1.5% 3.7% 3.9% 9.4%23 276 204 40 543

4.2% 50.8% 37.6% 7.4% 100.0%

Count% of TotalCount% of TotalCount% of TotalCount% of TotalCount% of Total

Excellent

Good

Only fair

Poor

RATE JOB DONE:VT'S JUDGESOVERALL

Total

Excellent Good Only fair PoorRATE JOB DONE: VT'S PROSECUTORS

Total

Symmetric Measures

.395 .033 12.634 .000543

KappaMeasure of AgreementN of Valid Cases

ValueAsymp. Std.

Errora Approx. Tb Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Page 21: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

21

Correlation-based Measures

Coefficient of correlation When two variables are numerical the Pearson

correlation coefficient has been widely used. It measures the strength of the linear relationship

between two numerical variables.

YX ss

YXr

),cov(

1

)(

1

)(

1

))((),cov( where

1

2

1

2

1

n

YYs

n

XXs

n

YYXXYX

n

ii

Y

n

ii

X

n

iii

Page 22: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

22

Correlation-based Measures

Definition of Spearman rank correlation

Page 23: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

23

An example of rank correlation

sr

Page 24: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

24

Symmetric Measures

.497 .042 13.330 .000c

.492 .039 13.128 .000c

543

Pearson's RInterval by IntervalSpearman CorrelationOrdinal by Ordinal

N of Valid Cases

ValueAsymp. Std.

Errora Approx. Tb Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Based on normal approximation.c.

Correlation-based Measures

Spearman correlation coefficient is a nonparametric measure of association.

Page 25: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

25

Measure based on the chi-square statistic

Since the chi-square test of independence is often used when analyzing crosstabulations, there are a variety of measure of association that are based on the chi-square statistic.

The chi-square statistic is not a good measure of association. Several modifications have been proposed.

Page 26: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

26

Contingency Table Example

Left-Handed vs. Gender

Dominant Hand: Left vs. Right

Gender: Male vs. Female

2 categories for each variable, so called a 2 x 2 table

Suppose we examine a sample of size 300

Page 27: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

27

Contingency Table Example

Sample results organized in a contingency table:

(continued)

Gender

Hand Preference

Left Right

Female 12 108 120

Male 24 156 180

36 264 300

120 Females, 12 were left handed

180 Males, 24 were left handed

sample size = n = 300:

Page 28: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

28

2 Test for the Difference Between Two Proportions

If H0 is true, then the proportion of left-handed females should be the same as the proportion of left-handed males!

The two proportions above should be the same as the proportion of left-handed people overall

H0: π1 = π2 (Proportion of females who are left handed is equal to the proportion of

males who are left handed) H1: π1 ≠ π2 (The two proportions are not the

same Hand preference is not independent of gender)

Page 29: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

29

The Chi-Square Test Statistic

where:fo = observed frequency in a particular cellfe = expected frequency in a particular cell if H0 is true

2 for the K x L case has (K-1)(L-1) degree of freedom

(Assumed: each cell in the contingency table has expected frequency of at least 5)

cells all e

2eo2

f

)ff(

The Chi-square test statistic is:

Page 30: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

30

Decision Rule

2U

Decision Rule:If 2 > 2

U, reject H0, otherwise, do not reject H0

The 2 test statistic approximately follows a chi-squared distribution with one degree of freedom

0

Reject H0Do not reject H0

http://www.statsoft.com/textbook/stathome.html?sttable.html&1

Page 31: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

31

Computing the Average Proportion

Here: 120 Females, 12 were left handed

180 Males, 24 were left handed

i.e., the proportion of left handers overall is 0.12, that is, 12%

n

X

nn

XXp

21

21

12.0300

36

180120

2412p

The average proportion is:

Page 32: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

32

Finding Expected Frequencies

To obtain the expected frequency for left handed females, multiply the average proportion left handed (p) by the total number of females

To obtain the expected frequency for left handed males, multiply the average proportion left handed (p) by the total number of males

If the two proportions are equal, then

P(Left Handed | Female) = P(Left Handed | Male) = .12

i.e., we would expect (.12)(120) = 14.4 females to be left handed(.12)(180) = 21.6 males to be left handed

Page 33: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

33

Observed vs. Expected Frequencies

Gender

Hand Preference

Left Right

FemaleObserved = 12

Expected = 14.4

Observed = 108Expected =

105.6120

MaleObserved = 24

Expected = 21.6

Observed = 156Expected =

158.4180

36 264 300

Page 34: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

34

Gender

Hand Preference

Left Right

FemaleObserved = 12

Expected = 14.4Observed = 108

Expected = 105.6120

MaleObserved = 24

Expected = 21.6Observed = 156

Expected = 158.4180

36 264 300

0.7576158.4

158.4)(156

21.6

21.6)(24

105.6

105.6)(108

14.4

14.4)(12

f

)f(fχ

2222

cells all e

2eo2

The Chi-Square Test Statistic

The test statistic is:

Page 35: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

35

Decision Rule

Decision Rule:If 2 > 3.841, reject H0, otherwise, do not reject H0

3.841 d.f. 1 with , 0.7576 isstatistic test The 2U

2 χχ

Here, 2 = 0.7576 < 2

U = 3.841, so we do not reject H0 and conclude that there is not sufficient evidence that the two proportions are different at = 0.05

2U=3.841

0

Reject H0Do not reject H0

Page 36: 1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ

36

Measure based on the chi-square statistic

Since the chi-square test of independence is often used when analyzing crosstabulations, there are a variety of measure of association that are based on the chi-square statistic.

The chi-square statistic is not a good measure of association. Several modifications have been proposed.