76197074 ken black qa 5th chapter 12 solution

36
Chapter 12: Analysis of Categorical Data 1 Chapter 12 Analysis of Categorical Data LEARNING OBJECTIVES This chapter presents several nonparametric statistics that can be used to analyze data enabling you to: 1. Understand the chi-square goodness-of-fit test and how to use it. 2. Analyze data using the chi-square test of independence. CHAPTER TEACHING STRATEGY Chapter 12 is a chapter containing the two most prevalent chi-square tests: chi- square goodness-of-fit and chi-square test of independence. These two techniques are important because they give the statistician a tool that is particularly useful for analyzing nominal data (even though independent variable categories can sometimes have ordinal or higher categories). It should be emphasized that there are many instances in business research where the resulting data gathered are merely categorical identification. For example, in segmenting the market place (consumers or industrial users), information is gathered regarding gender, income level, geographical location, political affiliation, religious preference, ethnicity, occupation, size of company, type of industry, etc. On these variables, the measurement is often a tallying of the frequency of occurrence of individuals, items, or companies in each category. The subject of the research is given no "score" or "measurement" other than a 0/1 for being a member or not of a given category. These two chi-square tests are perfectly tailored to analyze such data.

Upload: jennifer-clement

Post on 08-Nov-2014

88 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 1

Chapter 12Analysis of Categorical Data

LEARNING OBJECTIVES

This chapter presents several nonparametric statistics that can be used to analyze data enabling you to:

1. Understand the chi-square goodness-of-fit test and how to use it.

2. Analyze data using the chi-square test of independence.

CHAPTER TEACHING STRATEGY

Chapter 12 is a chapter containing the two most prevalent chi-square tests: chi-square goodness-of-fit and chi-square test of independence. These two techniques are important because they give the statistician a tool that is particularly useful for analyzing nominal data (even though independent variable categories can sometimes have ordinal or higher categories). It should be emphasized that there are many instances in business research where the resulting data gathered are merely categorical identification. For example, in segmenting the market place (consumers or industrial users), information is gathered regarding gender, income level, geographical location, political affiliation, religious preference, ethnicity, occupation, size of company, type of industry, etc. On these variables, the measurement is often a tallying of the frequency of occurrence of individuals, items, or companies in each category. The subject of the research is given no "score" or "measurement" other than a 0/1 for being a member or not of a given category. These two chi-square tests are perfectly tailored to analyze such data.

Page 2: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 2

The chi-square goodness-of-fit test examines the categories of one variable to determine if the distribution of observed occurrences matches some expected or theoretical distribution of occurrences. It can be used to determine if some standard or previously known distribution of proportions is the same as some observed distribution of proportions. It can also be used to validate the theoretical distribution of occurrences of phenomena such as random arrivals that are often assumed to be Poisson distributed. You will note that the degrees of freedom, k - 1 for a given set of expected values or for the uniform distribution, change to k - 2 for an expected Poisson distribution and to k - 3 for an expected normal distribution. To conduct a chi-square goodness-of-fit test to analyze an expected Poisson distribution, the value of lambda must be estimated from the observed data. This causes the loss of an additional degree of freedom. With the normal distribution, both the mean and standard deviation of the expected distribution are estimated from the observed values causing the loss of two additional degrees of freedom from the k - 1 value.

The chi-square test of independence is used to compare the observed frequencies along the categories of two independent variables to expected values to determine if the two variables are independent or not. Of course, if the variables are not independent, they are dependent or related. This allows business researchers to reach some conclusions about such questions as: is smoking independent of gender or is type of housing preferred independent of geographic region? The chi-square test of independence is often used as a tool for preliminary analysis of data gathered in exploratory research where the researcher has little idea of what variables seem to be related to what variables, and the data are nominal. This test is particularly useful with demographic type data.

A word of warning is appropriate here. When an expected frequency is small, the observed chi-square value can be inordinately large thus yielding an increased possibility of committing a Type I error. The research on this problem has yielded varying results with some authors indicating that expected values as low as two or three are acceptable and other researchers demanding that expected values be ten or more. In this text, we have settled on the fairly widespread accepted criterion of five or more.

Page 3: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 3

CHAPTER OUTLINE

12.1 Chi-Square Goodness-of-Fit TestTesting a Population Proportion Using the Chi-square Goodness-of-Fit Test as an Alternative Technique to the z Test

12.2 Contingency Analysis: Chi-Square Test of Independence

KEY TERMS

Categorical Data Chi-Square Test of IndependenceChi-Square Distribution Contingency AnalysisChi-Square Goodness-of-Fit Test Contingency Table

Page 4: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 4

SOLUTIONS TO THE ODD-NUMBERED PROBLEMS IN CHAPTER 12

12.1 f0 fe

e

eo

fff 2)( −

53 68 3.30937 42 0.59532 33 0.03028 22 1.63618 10 6.40015 8 6.125

Ho: The observed distribution is the same as the expected distribution.

Ha: The observed distribution is not the sameas the expected distribution.

Observed ∑ −=e

e

f

ff 202 )(χ = 18.095

df = k - 1 = 6 - 1 = 5, α = .05

χ 2.05,5 = 11.0705

Since the observed χ 2 = 18.095 > χ 2.05,5 = 11.0705, the decision is to reject the

null hypothesis.

The observed frequencies are not distributed the same as the expected frequencies.

Page 5: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 5

12.2 f0 fe

e

eo

fff 2)( −

19 18 0.056 17 18 0.056 14 18 0.889 18 18 0.000 19 18 0.056 21 18 0.500 18 18 0.000 18 18 0.000

Σ fo = 144 Σ fe = 144 1.557

Ho: The observed frequencies are uniformly distributed.

Ha: The observed frequencies are not uniformly distributed.

8

1440 == ∑k

fx = 18

In this uniform distribution, each fe = 18

df = k – 1 = 8 – 1 = 7, α = .01

χ 2.01,7 = 18.4753

Observed ∑ −=e

e

f

ff 202 )(χ = 1.557

Since the observed χ 2 = 1.557 < χ 2.01,7 = 18.4753, the decision is to fail to

reject the null hypothesis

There is no reason to conclude that the frequencies are not uniformly distributed.

Page 6: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 6

12.3 Number f0 (Number)( f 0) 0 28 0

1 17 172 11 223 _5 15

61 54

Ho: The frequency distribution is Poisson.Ha: The frequency distribution is not Poisson.

λ = 61

54=0.9

Expected Expected Number Probability Frequency

0 .4066 24.803 1 .3659 22.320

2 .1647 10.047> 3 .0628 3.831

Since fe for > 3 is less than 5, collapse categories 2 and >3:

Number fo fe e

eo

fff 2)( −

0 28 24.803 0.4121 17 22.320 1.268

>2 16 13.878 0.32461 60.993 2.004

df = k - 2 = 3 - 2 = 1, α = .05

χ 2.05,1 = 3.8415

Observed ∑ −=e

e

f

ff 202 )(χ = 2.001

Since the observed χ 2 = 2.001 < χ 2.05,1 = 3.8415, the decision is to fail to reject

the null hypothesis.

There is insufficient evidence to reject the distribution as Poisson distributed. The conclusion is that the distribution is Poisson distributed.

Page 7: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 7

12.4

Category f (observed) Midpt. fm fm 2 10-20 6 15 90 1,35020-30 14 25 350 8,75030-40 29 35 1,015 35,52540-50 38 45 1,710 76,95050-60 25 55 1,375 75,62560-70 10 65 650 42,25070-80 7 75 525 39,375 n = Σ f = 129 Σ fm = 5,715 Σ fm2 = 279,825

129

715,5==∑∑

f

fmx = 44.3

s =

128129

)715,5(825,279

1

)( 222 −

=−

−∑ ∑

nn

fMfM = 14.43

Ho: The observed frequencies are normally distributed.Ha: The observed frequencies are not normally distributed.

For Category 10 - 20 Prob

z = 43.14

3.4410 − = -2.38 .4913

z = 43.14

3.4420 − = -1.68 - .4535

Expected prob.: .0378

For Category 20-30 Prob

for x = 20, z = -1.68 .4535

z = 43.14

3.4430 − = -0.99 -.3389

Expected prob: .1146

Page 8: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 8

For Category 30 - 40 Prob

for x = 30, z = -0.99 .3389

z = 43.14

3.4440 − = -0.30 -.1179

Expected prob: .2210

For Category 40 - 50 Prob

for x = 40, z = -0.30 .1179

z = 43.14

3.4450 − = 0.40 +.1554

Expected prob: .2733

For Category 50 - 60 Prob

z = 43.14

3.4460 − = 1.09 .3621

for x = 50, z = 0.40 -.1554Expected prob: .2067

For Category 60 - 70 Prob

z = 43.14

3.4470 − = 1.78 .4625

for x = 60, z = 1.09 -.3621Expected prob: .1004

For Category 70 - 80 Prob

z = 43.14

3.4480 − = 2.47 .4932

for x = 70, z = 1.78 -.4625 Expected prob: .0307

For x < 10:

Probability between 10 and the mean, 44.3, = (.0378 + .1145 + .2210+ .1179) = .4913. Probability < 10 = .5000 - .4912 = .0087

Page 9: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 9

For x > 80:

Probability between 80 and the mean, 44.3, = (.0307 + .1004 + .2067 + .1554) = .4932. Probability > 80 = .5000 - .4932 = .0068

Category Prob expected frequency < 10 .0087 .0087(129) = 1.1210-20 .0378 .0378(129) = 4.8820-30 .1146 14.7830-40 .2210 28.5140-50 .2733 35.2650-60 .2067 26.6660-70 .1004 12.9570-80 .0307 3.96 > 80 .0068 0.88

Due to the small sizes of expected frequencies, category < 10 is folded into 10-20 and >80 into 70-80.

Category fo fe e

eo

fff 2)( −

10-20 6 6.00 .00020-30 14 14.78 .04130-40 29 28.51 .00840-50 38 35.26 .21350-60 25 26.66 .10360-70 10 12.95 .67270-80 7 4.84 .964

2.001

Calculated ∑ −=e

e

f

ff 202 )(χ = 2.001

df = k - 3 = 7 - 3 = 4, α = .05

χ 2.05,4 = 9.4877

Since the observed χ 2 = 2.004 < χ 2.05,4 = 9.4877, the decision is to fail to reject

the null hypothesis. There is not enough evidence to declare that the observedfrequencies are not normally distributed.

Page 10: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 10

12.5 Definition fo Exp.Prop. fe e

eo

fff 2)( −

Happiness 42 .39 227(.39)= 88.53 24.46Sales/Profit 95 .12 227(.12)= 27.24 168.55Helping Others 27 .18 40.86 4.70Achievement/

Challenge 63 .31 70.37 0.77 227 198.48

Ho: The observed frequencies are distributed the same as the expected frequencies.

Ha: The observed frequencies are not distributed the same as the expected frequencies.

Observed χ 2 = 198.48

df = k – 1 = 4 – 1 = 3, α = .05

χ 2.05,3 = 7.8147

Since the observed χ 2 = 198.48 > χ 2.05,3 = 7.8147, the decision is to reject the

null hypothesis.

The observed frequencies for men are not distributed the same as the expected frequencies which are based on the responses of women.

Page 11: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 11

12.6 Age fo Prop. from survey fe e

eo

fff 2)( −

10-14 22 .09 (.09)(212)=19.08 0.4515-19 50 .23 (.23)(212)=48.76 0.0320-24 43 .22 46.64 0.2825-29 29 .14 29.68 0.0230-34 19 .10 21.20 0.23

> 35 49 .22 46.64 0.12 212 1.13

Ho: The distribution of observed frequencies is the same as the distribution of expected frequencies.

Ha: The distribution of observed frequencies is not the same as the distribution of expected frequencies.

α = .01, df = k - 1 = 6 - 1 = 5

χ 2.01,5 = 15.0863

The observed χ 2 = 1.13

Since the observed χ 2 = 1.13 < χ 2.01,5 = 15.0863, the decision is to fail to reject

the null hypothesis.

There is not enough evidence to declare that the distribution of observed frequencies is different from the distribution of expected frequencies.

Page 12: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 12

12.7 Age f o m fm fm 2 10-20 16 15 240 3,60020-30 44 25 1,100 27,50030-40 61 35 2,135 74,72540-50 56 45 2,520 113,40050-60 35 55 1,925 105,87560-70 19 65 1,235 80,275

231 Σ fm = 9,155 Σ fm2 = 405,375

231

155,9== ∑n

fMx = 39.63

s =

230231

)155,9(375,405

1

)( 222 −

=−

−∑ ∑

nn

fMfM = 13.6

Ho: The observed frequencies are normally distributed.Ha: The observed frequencies are not normally distributed.

For Category 10-20 Prob

z = 6.13

63.3910 − = -2.18 .4854

z = 6.13

63.3920 − = -1.44 -.4251

Expected prob. .0603

For Category 20-30 Prob

for x = 20, z = -1.44 .4251

z = 6.13

63.3930 − = -0.71 -.2611

Expected prob. .1640

For Category 30-40 Prob

for x = 30, z = -0.71 .2611

z = 6.13

63.3940 − = 0.03 +.0120

Expected prob. .2731

Page 13: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 13

For Category 40-50 Prob

z = 6.13

63.3950 − = 0.76 .2764

for x = 40, z = 0.03 -.0120

Expected prob. .2644

For Category 50-60 Prob

z = 6.13

63.3960 − = 1.50 .4332

for x = 50, z = 0.76 -.2764

Expected prob. .1568

For Category 60-70 Prob

z = 6.13

63.3970 − = 2.23 .4871

for x = 60, z = 1.50 -.4332

Expected prob. .0539

For < 10:Probability between 10 and the mean = .0603 + .1640 + .2611 = .4854Probability < 10 = .5000 - .4854 = .0146

For > 70:Probability between 70 and the mean = .0120 + .2644 + .1568 + .0539 = .4871Probability > 70 = .5000 - .4871 = .0129

Page 14: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 14

Age Probability f e < 10 .0146 (.0146)(231) = 3.3710-20 .0603 (.0603)(231) = 13.9320-30 .1640 37.8830-40 .2731 63.0940-50 .2644 61.0850-60 .1568 36.2260-70 .0539 12.45 > 70 .0129 2.98

Categories < 10 and > 70 are less than 5. Collapse the < 10 into 10-20 and > 70 into 60-70.

Age fo fe e

eo

fff 2)( −

10-20 16 17.30 0.1020-30 44 37.88 0.9930-40 61 63.09 0.0740-50 56 61.08 0.4250-60 35 36.22 0.0460-70 19 15.43 0.83

2.45

df = k - 3 = 6 - 3 = 3, α = .05

χ 2.05,3 = 7.8147

Observed χ 2 = 2.45

Since the observed χ 2 < χ 2.05,3 = 7.8147, the decision is to fail to reject the null

hypothesis.

There is no reason to reject that the observed frequencies are normally distributed.

Page 15: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 15

12.8 Number f ( f ) ⋅ (number) 0 18 0 1 28 28 2 47 94 3 21 63 4 16 64 5 11 55

6 or more 9 54 Σ f = 150 Σ f⋅ (number) = 358

λ = 150

358=⋅

∑∑

f

numberf = 2.4

Ho: The observed frequencies are Poisson distributed.Ha: The observed frequencies are not Poisson distributed.

Number Probability f e 0 .0907 (.0907)(150) = 13.61 1 .2177 (.2177)(150) = 32.66 2 .2613 39.20 3 .2090 31.35 4 .1254 18.81 5 .0602 9.03

6 or more .0358 5.36

fo fe 0

20 )(

f

ff e−

18 13.61 1.4228 32.66 0.6647 39.20 1.5521 31.35 3.4216 18.81 0.4211 9.03 0.43

9 5.36 2.47 10.37

The observed χ 2 = 10.37

α = .01, df = k – 2 = 7 – 2 = 5, χ 2.01,5 = 15.0863

Since the observed χ 2 = 10.37 < χ 2.01,5 = 15.0863, the decision is to fail to reject

the null hypothesis. There is not enough evidence to reject the claim that the observed

frequencies are Poisson distributed.

Page 16: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 16

12.9 H0: p = .28 n = 270 x = 62 Ha: p ≠ .28

fo fe e

eo

fff 2)( −

Spend More 62 270(.28) = 75.6 2.44656

Don't Spend More 208 270(.72) = 194.4 0.95144

Total 270 270.0 3.39800

The observed value of χ 2 is 3.398

α = .05 and α /2 = .025 df = k - 1 = 2 - 1 = 1

χ 2.025,1 = 5.02389

Since the observed χ 2 = 3.398 < χ 2.025,1 = 5.02389, the decision is to fail to

reject the null hypothesis.

12.10 H0: p = .30 n = 180 x= 42 Ha: p < .30

f0 fe e

eo

fff 2)( −

Provide 42 180(.30) = 54 2.6666

Don't Provide 138 180(.70) = 126 1.1429 Total 180 180 3.8095

The observed value of χ 2 is 3.8095 α = .05 df = k - 1 = 2 - 1 = 1

χ 2.05,1 = 3.8415

Since the observed χ 2 = 3.8095 < χ 2.05,1 = 3.8415, the decision is to fail to

reject the null hypothesis.

Page 17: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 17

12.11 Variable Two

Variable One

203 326 529 178 68 110

271 436 707

Ho: Variable One is independent of Variable Two.Ha: Variable One is not independent of Variable Two.

e11 = 707

)271)(529( = 202.77 e12 =

707

)436)(529( = 326.23

e21 = 707

)178)(271( = 68.23 e22 =

707

)178)(436( = 109.77

Variable TwoVariable One

(202.77) 203

(326.23) 326

529 178 (68.23)

68(109.77) 110

271 436 707

χ 2 = 77.202

)77.202203( 2− +

23.326

)23.326326( 2− +

23.68

)23.668( 2− +

77.109

)77.109110( 2− =

.00 + .00 + .00 + .00 = 0.00

α = .01, df = (c-1)(r-1) = (2-1)(2-1) = 1

χ 2.01,1 = 6.6349

Since the observed χ 2 = 0.00 < χ 2.01,1 = 6.6349, the decision is to fail to reject

the null hypothesis.Variable One is independent of Variable Two.

Page 18: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 18

12.12 Variable Two

Variable One

24 13 47 58 142 583 93 59 187 244

117 72 234 302

725

Ho: Variable One is independent of Variable Two.Ha: Variable One is not independent of Variable Two.

e11 = 725

)117)(142( = 22.92 e12 =

725

)72)(142( = 14.10

e13 = 725

)234)(142( = 45.83 e14 =

725

)302)(142( = 59.15

e21 = 725

)117)(583( = 94.08 e22 =

725

)72)(583( = 57.90

e23 = 725

)234)(583( = 188.17 e24 =

725

)302)(583( = 242.85

Variable TwoVariable One

(22.92) 24

(14.10) 13

(45.83) 47

(59.15) 58

142 583(94.08

) 93

(57.90) 59

(188.17) 187

(242.85) 244

117 72 234 302 725

χ 2 = 92.22

)92.2224( 2− +

10.14

)10.1413( 2− +

83.45

)83.4547( 2− +

15.59

)15.5958( 2− +

08.94

)08.9493( 2− +

90.57

)90.5759( 2− +

17.188

)17.188188( 2− +

85.242

)85.242244( 2− =

.05 + .09 + .03 + .02 + .01 + .02 + .01 + .01 = 0.24

α = .01, df = (c-1)(r-1) = (4-1)(2-1) = 3, χ 2.01,3 = 11.3449

Since the observed χ 2 = 0.24 < χ 2.01,3 = 11.3449, the decision is to fail to

Page 19: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 19

reject the null hypothesis.

Variable One is independent of Variable Two.

12.13 Social Class

Number ofChildren

Lower Middle Upper 0 1 2 or 3 >3

7 18 6 31 70 189 108

9 38 23 34 97 58 47 31 30 97 184 117 398

Ho: Social Class is independent of Number of Children.Ha: Social Class is not independent of Number of Children.

e11 = 398

)97)(31( = 7.56 e31 =

398

)97)(189( = 46.06

e12 = 398

)184)(31( = 14.3 e32 =

398

)184)(189( = 87.38

e13 = 398

)117)(31( = 9.11 e33 =

398

)117)(189( = 55.56

e21 = 398

)97)(70( = 17.06 e41 =

398

)97)(108( = 26.32

e22 = 398

)184)(70( = 32.36 e42 =

398

)184)(108( = 49.93

e23 = 398

)117)(70( = 20.58 e43 =

398

)117)(108( = 31.75

Social Class

Number ofChildren

Lower Middle Upper 0 1 2 or 3

(7.56) 7

(14.33) 18

(9.11) 6

31 70 189

(17.06) 9

(32.36) 38

(20.58) 23

Page 20: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 20

>3 108

(46.06) 34

(87.38) 97

(55.56) 58

(26.32) 47

(49.93) 31

(31.75) 30

97 184 117 398

χ 2 = 56.7

)56.77( 2− +

33.14

)33.1418( 2− +

11.9

)11.96( 2− +

06.17

)06.179( 2− +

36.32

)36.3238( 2− +

58.20

)58.2023( 2− +

06.46

)06.4634( 2− +

38.87

)38.8797( 2− +

56.55

)56.5558( 2− +

32.26

)32.2647( 2− +

93.49

)93.4931( 2− +

75.31

)75.3130( 2− =

.04 + .94 + 1.06 + 3.81 + .98 + .28 + 3.16 + 1.06 + .11 + 16.25 +

7.18 + .10 = 34.97

α = .05, df = (c-1)(r-1) = (3-1)(4-1) = 6

χ 2.05,6 = 12.5916

Since the observed χ 2 = 34.97 > χ 2.05,6 = 12.5916, the decision is to reject the

null hypothesis.

Number of children is not independent of social class.

12.14 Type of Music Preferred

Region

Rock R&B Coun Clssic 195 235 202 632

NE 140 32 5 18 S 134 41 52 8 W 154 27 8 13 428 100 65 39

Ho: Type of music preferred is independent of region.Ha: Type of music preferred is not independent of region.

Page 21: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 21

e11 = 632

)428)(195( = 132.6 e23 =

632

)65)(235( = 24.17

e12 = 632

)100)(195( = 30.85 e24 =

632

)39)(235( = 14.50

e13 = 632

)65)(195( = 20.06 e31 =

632

)428)(202( = 136.80

e14 = 632

)39)(195( = 12.03 e32 =

632

)100)(202( = 31.96

e21 = 632

)428)(235( = 159.15 e33 =

632

)65)(202( = 20.78

e22 = 632

)100)(235( = 37.18 e34 =

632

)39)(202( = 12.47

Type of Music Preferred

Region

Rock R&B Coun Clssic 195 235 202 632

NE (132.06) 140

(30.85) 32

(20.06) 5

(12.03) 18

S (159.15) 134

(37.18) 41

(24.17) 52

(14.50) 8

W (136.80) 154

(31.96) 27

(20.78) 8

(12.47) 13

428 100 65 39

χ 2 = 06.132

)06.132141( 2− +

85.30

)85.3032( 2− +

06.20

)06.205( 2− +

03.12

)03.1218( 2−

+

15.159

)15.159134( 2− +

18.37

)18.3741( 2− +

17.24

)17.2452( 2− +

50.14

)50.148( 2− +

80.136

)80.136154( 2− +

96.31

)96.3127( 2− +

78.20

)78.208( 2− +

47.12

)47.1213( 2−

=

.48 + .04 + 11.31 + 2.96 + 3.97 + .39 + 32.04 + 2.91 + 2.16 + .77 +

Page 22: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 22

7.86 + .02 = 64.91

α = .01, df = (c-1)(r-1) = (4-1)(3-1) = 6

χ 2.01,6 = 16.8119

Since the observed χ 2 = 64.91 > χ 2.01,6 = 16.8119, the decision is to

reject the null hypothesis.

Type of music preferred is not independent of region of the country.12.15

Transportation Mode

Industry

Air Train Truck 85 35 120

Publishing 32 12 41Comp.Hard. 5 6 24

37 18 65

H0: Transportation Mode is independent of Industry.Ha: Transportation Mode is not independent of Industry.

e11 = 120

)37)(85( = 26.21 e21 =

120

)37)(35( = 10.79

e12 = 120

)18)(85( = 12.75 e22 =

120

)18)(35( = 5.25

e13 = 120

)65)(85( = 46.04 e23 =

120

)65)(35( = 18.96

Transportation Mode

Industry

Air Train Truck 85

35 120

Publishing (26.21) 32

(12.75) 12

(46.04) 41

Comp.Hard. (10.79) 5

(5.25) 6

(18.96) 24

37 18 65

χ 2 = 21.26

)21.2632( 2− +

75.12

)75.1212( 2− +

04.46

)04.4641( 2− +

79.10

)79.105( 2− +

25.5

)25.56( 2− +

96.18

)96.1824( 2− =

Page 23: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 23

1.28 + .04 + .55 + 3.11 + .11 + 1.34 = 6.43

α = .05, df = (c-1)(r-1) = (3-1)(2-1) = 2

χ 2.05,2 = 5.9915

Since the observed χ 2 = 6.43 > χ 2.05,2 = 5.9915, the decision is to

reject the null hypothesis.

Transportation mode is not independent of industry.

12.16 Number of Bedrooms

Number of Stories

< 2 3 > 4 274 575

1 116 101 57 2 90 325 160

206 426 217 849

H0: Number of Stories is independent of number of bedrooms.Ha: Number of Stories is not independent of number of bedrooms.

e11 = 849

)206)(274( = 66.48 e21 =

849

)206)(575( = 139.52

e12 = 849

)426)(274( = 137.48 e22

= 849

)426)(575( = 288.52

e13 = 849

)217)(274( = 70.03 e23 =

849

)217)(575( = 146.97

χ 2 = 52.139

)52.13990( 2− +

48.137

)48.137101( 2− +

03.70

)03.7057( 2− +

52.139

)52.13990( 2− +

52.288

)52.288325( 2− +

97.146

)97.146160( 2− =

χ 2 = 36.89 + 9.68 + 2.42 + 17.58 + 4.61 + 1.16 = 72.34

α = .10 df = (c-1)(r-1) = (3-1)(2-1) = 2

χ 2.10,2 = 4.6052

Page 24: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 24

Since the observed χ 2 = 72.34 > χ 2.10,2 = 4.6052, the decision is to

reject the null hypothesis.

Number of stories is not independent of number of bedrooms.

Page 25: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 25

12.17 Mexican Citizens

Type of Store

Yes No 41 35 30 60

Dept. 24 17Disc. 20 15Hard. 11 19Shoe 32 28 87 79 166

Ho: Citizenship is independent of store typeHa: Citizenship is not independent of store type

e11 = 166

)87)(41( = 21.49 e31 =

166

)87)(30( = 15.72

e12 = 166

)79)(41( = 19.51 e32 =

166

)79)(30( = 14.28

e21 = 166

)87)(35( = 18.34 e41 =

166

)87)(60( = 31.45

e22 = 166

)79)(35( = 16.66 e42 =

166

)79)(60( = 28.55

Mexican Citizens

Type of Store

Yes No 41 35 30 60

Dept. (21.49) 24

(19.51) 17

Disc. (18.34) 20

(16.66) 15

Hard. (15.72) 11

(14.28) 19

Shoe (31.45) 32

(28.55) 28

87 79 166

χ 2 = 49.21

)49.2124( 2− +

51.19

)51.1917( 2− +

34.18

)34.1820( 2− +

66.16

)66.1615( 2−

+

Page 26: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 26

72.15

)72.1511( 2− +

28.14

)28.1419( 2− +

45.31

)45.3132( 2− +

55.28

)55.2828( 2−

=

.29 + .32 + .15 + .17 + 1.42 + 1.56 + .01 + .01 = 3.93

α = .05, df = (c-1)(r-1) = (2-1)(4-1) = 3

χ 2.05,3 = 7.8147

Since the observed χ 2 = 3.93 < χ 2.05,3 = 7.8147, the decision is to fail to

reject the null hypothesis.

Citizenship is independent of type of store.

12.18 α = .01, k = 7, df = 6

H0: The observed distribution is the same as the expected distribution Ha: The observed distribution is not the same as the expected distribution

Use:

∑ −=e

e

f

ff 202 )(χ

critical χ 2.01,6 = 16.8119

f o f e ( f 0- f e) 2 e

eo

fff 2)( −

214 206 64 0.311 235 232 9 0.039

279 268 121 0.451 281 284 9 0.032

264 268 16 0.060254 232 484 2.086

211 206 25 0.121 3.100

∑ −=e

e

f

ff 202 )(χ = 3.100

Page 27: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 27

Since the observed value of χ 2 = 3.1 < χ 2.01,6 = 16.8119, the decision is to fail

to reject the null hypothesis. The observed distribution is not different from the expected distribution.

12.19 Variable 2

Variable 1 12 23 21 56 8 17 20 45 7 11 18 36 27 51 59 137

e11 = 11.04 e12 = 20.85 e13 = 24.12

e21 = 8.87 e22 = 16.75 e23 = 19.38

e31 = 7.09 e32 = 13.40 e33 = 15.50

χ 2 = 04.11

)04.1112( 2− +

85.20

)85.2023( 2− +

12.24

)12.2421( 2− +

87.8

)87.88( 2− +

75.16

)75.1617( 2− +

38.19

)38.1920( 2− +

09.7

)09.77( 2− +

40.13

)40.1311( 2− +

50.15

)50.1518( 2− =

.084 + .222 + .403 + .085 + .004 + .020 + .001 + .430 + .402 = 1.652

df = (c-1)(r-1) = (2)(2) = 4 α = .05

χ 2.05,4 = 9.4877

Since the observed value of χ 2 = 1.652 < χ 2.05,4 = 9.4877, the decision is to fail

to reject the null hypothesis.

Page 28: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 28

12.20 Location NE W S

Customer Industrial 230 115 68 413Retail 185 143 89 417

415 258 157 830

e11 = 830

)415)(413( = 206.5 e21 =

830

)415)(417( = 208.5

e12 = 830

)258)(413( = 128.38 e22 =

830

)258)(417( = 129.62

e13 = 830

)157)(413( = 78.12 e23 =

830

)157)(417( = 78.88

Location NE W S

Customer Industrial (206.5) 230

(128.38) 115

(78.12) 68

413

Retail (208.5) 185

(129.62) 143

(78.88) 89

417

415 258 157 830

χ 2 = 5.206

)5.206230( 2− +

38.128

)38.128115( 2− +

12.78

)12.7868( 2− +

5.208

)5.208185( 2− +

62.129

)62.129143( 2− +

88.78

)88.7889( 2− =

2.67 + 1.39 + 1.31 + 2.65 + 1.38 + 1.30 = 10.70 α = .10 and df = (c - 1)(r - 1) = (3 - 1)(2 - 1) = 2

χ 2.10,2 = 4.6052

Since the observed χ 2 = 10.70 > χ 2.10,2 = 4.6052, the decision is to reject the

null hypothesis.

Page 29: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 29

Type of customer is not independent of geographic region.

12.21 Cookie Type f o Chocolate Chip 189Peanut Butter 168Cheese Cracker 155Lemon Flavored 161Chocolate Mint 216Vanilla Filled 165

Σ fo = 1,054

Ho: Cookie Sales is uniformly distributed across kind of cookie.Ha: Cookie Sales is not uniformly distributed across kind of cookie.

If cookie sales are uniformly distributed, then fe = 6

054,1

.0 =∑

kindsno

f = 175.67

f o f e e

eo

fff 2)( −

189 175.67 1.01168 175.67 0.33155 175.67 2.43161 175.67 1.23216 175.67 9.26165 175.67 0.65

14.91

The observed χ 2 = 14.91

α = .05 df = k - 1 = 6 - 1 = 5

χ 2.05,5 = 11.0705

Since the observed χ 2 = 14.91 > χ 2.05,5 = 11.0705, the decision is to reject the

null hypothesis.

Cookie Sales is not uniformly distributed by kind of cookie.

Page 30: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 30

12.22 Gender M F

Bought Car

Y 207 65 272 N 811 984 1,795

1,018 1,049 2,067

Ho: Purchasing a car or not is independent of gender.Ha: Purchasing a car or not is not independent of gender.

e11 = 067,2

)018,1)(272( = 133.96 e12 =

067,2

)049,1)(27( = 138.04

e21 = 067,2

)018,1)(795,1( = 884.04 e22 =

067,2

)049,1)(795,1( = 910.96

Gender M F

Bought Car

Y (133.96) 207

(138.04) 65

272

N (884.04) 811

(910.96) 984

1,795

1,018 1,049 2,067

χ 2 = 96.133

)96.133207( 2− +

04.138

)04.13865( 2− +

04.884

)04.884811( 2− +

96.910

)96.910984( 2− = 39.82 + 38.65 + 6.03 + 5.86 = 90.36

α = .05 df = (c-1)(r-1) = (2-1)(2-1) = 1

χ 2.05,1 = 3.8415

Since the observed χ 2 = 90.36 > χ 2.05,1 = 3.8415, the decision is to reject the

null hypothesis.

Purchasing a car is not independent of gender.

Page 31: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 31

12.23 Arrivals f o ( f o)(Arrivals) 0 26 01 40 402 57 1143 32 964 17 685 12 606 8 48

Σ fo = 192 Σ (fo)(arrivals) = 426

λ = 192

426))((

0

0 =∑

∑f

arrivalsf = 2.2

Ho: The observed frequencies are Poisson distributed.Ha: The observed frequencies are not Poisson distributed.

Arrivals Probability f e 0 .1108 (.1108)(192) = 21.271 .2438 (.2438)(192) = 46.812 .2681 51.483 .1966 37.754 .1082 20.775 .0476 9.146 .0249 4.78

f o f e e

eo

fff 2)( −

26 21.27 1.0540 46.81 0.9957 51.48 0.5932 37.75 0.8817 20.77 0.6812 9.14 0.89 8 4.78 2.17

7.25

Observed χ 2 = 7.25

α = .05 df = k - 2 = 7 - 2 = 5

χ 2.05,5 = 11.0705

Since the observed χ 2 = 7.25 < χ 2.05,5 = 11.0705, the decision is to fail to reject

the null hypothesis. There is not enough evidence to reject the claim that the observed frequency of arrivals is Poisson distributed.

Page 32: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 32

12.24 Ho: The distribution of observed frequencies is the same as the distribution of expected frequencies.Ha: The distribution of observed frequencies is not the same as the distribution of expected frequencies.

Soft Drink f o proportions f e e

eo

fff 2)( −

Classic Coke 314 .179 (.179)(1726) = 308.95 0.08 Pepsi 219 .115 (.115)(1726) = 198.49 2.12Diet Coke 212 .097 167.42 11.87 Mt. Dew 121 .063 108.74 1.38 Diet Pepsi 98 .061 105.29 0.50Sprite 93 .057 98.32 0.29 Dr. Pepper 88 .056 96.66 0.78 Others 581 .372 642.07 5.81

∑fo = 1,726 22.83

Observed χ 2 = 22.83

α = .05 df = k - 1 = 8 - 1 = 7

χ 2.05,7 = 14.0671

Since the observed χ 2 = 22.83 > χ 2.05,6 = 14.0671, the decision is to reject the

null hypothesis.

The observed frequencies are not distributed the same as the expected frequencies from the national poll.

Page 33: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 33

12.25

Position

Manager Programmer Operator

Systems Analyst

Years 0-3 6 37 11 13 67 4-8 28 16 23 24 91 > 8 47 10 12 19 88

81 63 46 56 246

e11 = 246

)81)(67( = 22.06 e23 =

246

)46)(91( = 17.02

e12 = 246

)63)(67( = 17.16 e24 =

246

)56)(91( = 20.72

e13 = 246

)46)(67( = 12.53 e31 =

246

)81)(88( = 28.98

e14 = 246

)56)(67( = 15.25 e32 =

246

)63)(88( = 22.54

e21 = 246

)81)(91( = 29.96 e33 =

246

)46)(88( = 16.46

e22 = 246

)63)(91( = 23.30 e34 =

246

)56)(88( = 20.03

Position

Manager Programmer Operator

Systems Analyst

Years 0-3 (22.06)

6 (17.16) 37

(12.53) 11

(15.25) 13

67

4-8 (29.96) 28

(23.30) 16

(17.02) 23

(20.72) 24

91

> 8 (28.98) 47

(22.54) 10

(16.46) 12

(20.03) 19

88

81 63 46 56 246

Page 34: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 34

χ 2 = 06.22

)06.226( 2− +

16.17

)16.1737( 2− +

53.12

)53.1211( 2− +

25.15

)25.1513( 2− +

96.29

)96.2928( 2− +

30.23

)30.2316( 2− +

02.17

)02.1723( 2− +

72.20

)72.2024( 2− +

98.28

)98.2847( 2− +

54.22

)54.2210( 2− +

46.16

)46.1612( 2− +

03.20

)03.2019( 2− =

11.69 + 22.94 + .19 + .33 + .13 + 2.29 + 2.1 + .52 + 11.2 + 6.98 +

1.21 + .05 = 59.63

α = .01 df = (c-1)(r-1) = (4-1)(3-1) = 6

χ 2.01,6 = 16.8119

Since the observed χ 2 = 59.63 > χ 2.01,6 = 16.8119, the decision is to reject the

null hypothesis. Position is not independent of number of years of experience.

12.26 H0: p = .43 n = 315 α =.05 Ha: p ≠ .43 x = 120 α /2 = .025

f o f e e

eo

fff 2)( −

More Work, More Business 120 (.43)(315) = 135.45 1.76

Others 195 (.57)(315) = 179.55 1.33

Total 315 315.00 3.09

The observed value of χ 2 is 3.09 α = .05 and α /2 = .025 df = k - 1 = 2 - 1 = 1

χ 2.025,1 = 5.0239

Since χ 2 = 3.09 < χ 2.025,1 = 5.0239, the decision is to fail to reject the null

hypothesis.

Page 35: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 35

12.27 Type of College or UniversityCommunity College

LargeUniversity

SmallCollege

Number ofChildren

0 25 178 31 234 1 49 141 12 202 2 31 54 8 93 >3 22 14 6 42

127 387 57 571

Ho: Number of Children is independent of Type of College or University.Ha: Number of Children is not independent of Type of College or University.

e11 = 571

)127)(234( = 52.05 e31 =

571

)127)(93( = 20.68

e12 = 571

)387)(234( = 158.60 e32 =

571

)387)(193( = 63.03

e13 = 571

)57)(234( = 23.36 e33 =

571

)57)(93( = 9.28

e21 = 571

)127)(202( = 44.93 e41 =

571

)127)(42( = 9.34

e22 = 571

)387)(202( = 136.91 e42 =

571

)387)(42( = 28.47

e23 = 571

)57)(202( = 20.16 e43 =

571

)57)(42( = 4.19

Type of College or UniversityCommunity College

LargeUniversity

SmallCollege

Number ofChildren

0 (52.05) 25

(158.60) 178

(23.36) 31

234

1 (44.93) 49

(136.91) 141

(20.16) 12

202

2 (20.68) 31

(63.03) 54

(9.28) 8

93

>3 (9.34) 22

(28.47) 14

(4.19) 6

42

127 387 57 571

Page 36: 76197074 Ken Black QA 5th Chapter 12 Solution

Chapter 12: Analysis of Categorical Data 36

χ 2 = 05.52

)05.5225( 2− +

6.158

)6.158178( 2− +

36.23

)36.2331( 2− +

93.44

)93.4449( 2− +

91.136

)91.136141( 2− +

16.20

)16.2012( 2− +

68.20

)68.2031( 2− +

03.63

)03.6354( 2− +

28.9

)28.98( 2− +

34.9

)34.922( 2− +

47.28

)47.2814( 2− +

19.4

)19.46( 2− =

14.06 + 2.37 + 2.50 + 0.37 + 0.12 + 3.30 + 5.15 + 1.29 + 0.18 +

17.16 + 7.35 + 0.78 = 54.63 α = .05, df= (c - 1)(r - 1) = (3 - 1)(4 - 1) = 6

χ 2.05,6 = 12.5916

Since the observed χ 2 = 54.63 > χ 2.05,6 = 12.5916, the decision is to reject the

null hypothesis. Number of children is not independent of type of College or University.

12.28 The observed chi-square is 30.18 with a p-value of .0000043. The chi-square

goodness-of-fit test indicates that there is a significant difference between the observed frequencies and the expected frequencies. The distribution of responses to the question is not the same for adults between 21 and 30 years of age as they are for others. Marketing and sales people might reorient their 21 to 30 year old efforts away from home improvement and pay more attention to leisure travel/vacation, clothing, and home entertainment.

12.29 The observed chi-square value for this test of independence is 5.366. The associated p-value of .252 indicates failure to reject the null hypothesis. There is not enough evidence here to say that color choice is dependent upon gender. Automobile marketing people do not have to worry about which colors especially appeal to men or to women because car color is independent of gender. In addition, design and production people can determine car color quotas based on other variables.