chapter 11: chi-square tests. 2 the chi-square distribution definition the chi-square distribution...
Post on 21-Dec-2015
241 views
TRANSCRIPT
CHAPTER 11:
CHI-SQUARE TESTS
2
THE CHI-SQUARE DISTRIBUTION
Definition The chi-square distribution has only one
parameter called the degrees of freedom. The shape of a chi-squared distribution curve is skewed to the right for small df and becomes symmetric for large df. The entire chi-square distribution curve lies to the right of the vertical axis. The chi-square distribution assumes nonnegative values only, and these are denoted by the symbol χ2 (read as “chi-square”).
3
Figure 11.1 Three chi-square distribution curves.
4
Example 11-1
Find the value of χ² for 7 degrees of freedom and an area of .10 in the right tail of the chi-square distribution curve.
5
Table 11.1 χ2 for df = 7 and .10 Area in the
Right Tail
Area in the Right Tail Under the Chi-Square Distribution Curve
df .995 … .100 … .005
12.7.
100
.000
.010…
.989…
67.328
………………
2.7064.605
…12.017
…118.498
………………
7.87910.597
…20.278
…140.169
Required value of χ²
6
Figure 11.2
df = 7
.10
12.017 χ² 0
7
Example 11-2
Find the value of χ² for 12 degrees of freedom and area of .05 in the left tail of the chi-square distribution curve.
8
Solution 11-2
Area in the right tail = 1 – Area in the left tail = 1 – .05 = .95
9
Table 11.2 χ2 for df = 12 and .95 Area in the Right Tail
Area in the Right Tail Under the Chi-Square Distribution Curve
df .995 … .950 … .005
12.
12.
100
.000
.010…
3.074…
67.328
………………
.004
.103…
5.226…
77.929
………………
7.87910.597
…28.300
…140.169
Required value of χ²
10
Figure 11.3
χ²
Shaded area = .95
df = 12
.05
5.226 0
11
A GOODNESS-OF-FIT TEST
Definition An experiment with the following
characteristics is called a multinomial experiment.
12
Multinomial Experiment cont.
1. It consists of n identical trials (repetitions).
2. Each trial results in one of k possible outcomes (or categories), where k > 2.
3. The trials are independent.4. The probabilities of the various
outcomes remain constant for each trial.
13
A GOODNESS-OF-FIT TEST cont.
Definition The frequencies obtained from the
performance of an experiment are called the observed frequencies and are denoted by O. The expected frequencies, denoted by E, are the frequencies that we expect to obtain if the null hypothesis is true. The expected frequency for a category is obtained as
E = np Where n is the sample size and p is the
probability that an element belongs to that category if the null hypothesis is true.
14
A GOODNESS-OF-FIT TEST cont.
Degrees of Freedom for a Goodness-of-Fit Test
In a goodness-of-fit test, the degrees of freedom are
df = k – 1 where k denotes the number of
possible outcomes (or categories) for the experiment.
15
Test Statistic for a Goodness-of-Fit Test
The test statistic for a goodness-of-fit test is χ2 and its value is calculated as
where O = observed frequency for a category E = expected frequency for a category = np
Remember that a chi-square goodness-of-fit test is always right-tailed.
E
EO 22 )(
16
Example 11-3 A bank has an ATM installed inside the bank, and
it is available to its customers only from 7 AM to 6 PM Monday through Friday. The manager of the bank wanted to investigate if the percentage of transactions made on this ATM is the same for each of the five days (Monday through Friday) of the week. She randomly selected one week and counted the number of transactions made on this ATM on each of the five days during this week. The information she obtained is given in the following table, where the number of users represents the number of transactions on this ATM on these days. For convenience, we will refer to these transactions as “people” or “users.”
17
Example 11-3
At the 1% level of significance, can we reject the null hypothesis that the proportion of people who use this ATM each of the five days of the week is the same? Assume that this week is typical of all weeks in regard to the use of this ATM.
Day Monday
Tuesday Wednesday
Thursday
Friday
Number of users
253 197 204 179 267
18
Solution 11-3
H0 : p1 = p2 = p3 = p4 = p5 = .20 H1 : At least two of the five
proportions are not equal to .20
19
Solution 11.3
There are five categories Five days on which the ATM is used Multinomial experiment
We use the chi-square distribution to make this test.
20
Solution 11-3
Area in the right tail = α = .01 k = number of categories = 5 df = k – 1 = 5 – 1 = 4 The critical value of χ2 = 13.277
21
Figure 11.4
Reject H0 Do not reject H0
α = .01
13.277
χ2
Critical value of χ2
22
Table 11.3
Category (Day)
Observed Frequenc
y
O p
Expected Frequency
E = np(O – E) (O – E)2
MondayTuesdayWednesdayThursdayFriday
253197204279267
.20
.20
.20
.20
.20
1200(.20) = 240
1200(.20) = 240
1200(.20) = 240
1200(.20) = 240
1200(.20) = 240
13-43-363927
169184912961521729
.7047.7045.4006.3383.038
n = 1200 Sum = 23.184
E
EO 2)(
23
Solution 11-3
All the required calculations to find the value of the test statistic χ2 are shown in Table 11.3.
184.23)( 2
2
E
EO
24
Solution 11.3
The value of the test statistic χ2 = 23.184 is larger than the critical value of χ2 = 13.277 It falls in the rejection region
Hence, we reject the null hypothesis
25
Example 11-4 In a National Public Transportation survey
conducted in 1995 on the modes of transportation used to commute to work, 79.6% of the respondents said that they drive alone, 11.1% car pool, 5.1% use public transit, and 4.2% depend on other modes of transportation (USA TODAY, April 14, 1999). Assume that these percentages hold true for the 1995 population of all commuting workers. Recently 1000 randomly selected workers were asked what mode of transportation they use to commute to work. The following table lists the results of this survey.
26
Example 11-4
Mode of transportation
Drive alone Carpool Public transit Other
Number of workers 812 102 57 29
Test at the 2.5% significance level whether the current pattern of use of transportation modes is different from that for 1995.
27
Solution 11-4
H0: The current percentage distribution of the use of transportation modes is the same as that for 1995.
H1: The current percentage distribution of the use of transportation modes is different from that for 1995.
28
Solution 11-4 There are four categories
Drive alone, carpool, public transit, and other
Multinomial experiment We use the chi-square distribution to
make the test.
29
Solution 11-4
Area in the right tail = α = .025 k = number of categories = 4 df = k – 1 = 4 – 1 = 3 The critical value of χ2 = 9.348
30
Figure 11.5
Reject H0 Do not reject H0
α = .025
9.348 χ2
Critical value of χ2
31
Table 11.4
Category
Observed Frequenc
y
O p
Expected Frequency
E = np(O – E) (O – E)2
Drive aloneCar poolPublic transitOther
8121025729
.796
.111
.051
.042
1000(.796) = 796
1000(.111) = 111
1000(.051) = 51
1000(.042) = 42
16-96
-13
2568136169
.322
.730
.7064.024
n = 1000 Sum = 5.782
E
EO 2)(
32
Solution 11-4
All the required calculations to find the value of the test statistic χ2 are shown in Table 11.4.
782.5)( 2
2
E
EO
33
Solution 11-4
The value of the test statistic χ2 = 5.782 is less than the critical value of χ2 = 9.348 It falls in the nonrejection region
Hence, we fail to reject the null hypothesis.
34
CONTINGENCY TABLES
Full-Time Part-Time
Male 6768 2615
Female 7658 3717
Table 11.5 Total 2002 Enrollment at a University
Students who are male and enrolled part-time
35
A TEST OF INDEPENDENCE OR HOMOGENEITY
A Test of Independence A Test of Homogeneity
36
A Test of Independence Definition A test of independence involves a test of
the null hypothesis that two attributes of a population are not related. The degrees of freedom for a test of independence are
df = (R – 1)(C – 1) Where R and C are the number of rows
and the number of columns, respectively, in the given contingency table.
37
Test Statistic for a Test of Independence The value of the test statistic χ2 for a
test of independence is calculated as
where O and E are the observed and expected frequencies, respectively, for a cell.
A Test of Independence cont.
E
EO 22 )(
38
Example 11-5 Violence and lack of discipline have
become major problems in schools in the United States. A random sample of 300 adults was selected, and they were asked if they favor giving more freedom to schoolteachers to punish students for violence and lack of discipline. The two-way classification of the responses of these adults is represented in the following table.
39
Calculate the expected frequencies for this table assuming that the two attributes, gender and opinions on the issue, are independent.
Example 11-5
In Favor
(F)
Against(A)
No Opinions(N)
Men (M)Women (W)
9387
7032
126
40
Table 11.6
Solution 11-5
In Favor(F)
Against(A)
No Opinion(N)
Row Total
s
Men (M) 93 70 12 175
Women (W) 87 32 6 125
Column Totals
180 102 18 300
41
Expected Frequencies for a Test of Independence
The expected frequency E for a cell is calculated as
size sample
)lumn total total)(CoRow(E
42
Table 11.7
Solution 11-5
In Favor(F)
Against(A)
No Opinion(O)
Row Total
s
Men (M)93
(105.00)
70(59.50)
12(10.50)
175
Women (W)
87(75.00)
32(42.50)
6(7.50)
125
Column Totals
180 102 18 300
43
Example 11-6 Reconsider the two-way classification table
given in Example 11-5. In that example, a random sample of 300 adults was selected, and they were asked if they favor giving more freedom to schoolteachers to punish students for violence and lack of discipline. Based on the results of the survey, a two-way classification table was prepared and presented in Example 11-5. Does the sample provide sufficient information to conclude that the two attributes, gender and opinions of adults, are dependent? Use a 1% significance level.
44
Solution 11-6
H0: Gender and opinions of adults are independent
H1: Gender and opinions of adults are dependent
45
Solution 11-6
α = .01 df = (R – 1)(C – 1) = (2 – 1)(3 – 1) = 2 The critical value of χ2 = 9.210
46
Figure 11.6
Reject H0 Do not reject H0
α = .01
9.210 χ2
Critical value of χ2
47
Table 11.8
In Favor(F)
Against
(A)
No Opinion(N)
Row Totals
Men (M)
93(105.00)
70(59.50)
12(10.50)
175
Women (W)
87(75.00)
32(42.50)
6(7.50)
125
Column
Totals180 102 18 300
48
Solution 11-6
252.8300.594.2920.1214.853.1371.1 50.7
50.76
50.42
50.4232
00.75
00.7587
50.10
50.1012
50.59
50.5970
00.105
00.10593
)(
222
222
22
E
EO
49
Solution 11-6
The value of the test statistic χ2 = 8.252 It is less than the critical value of χ2
It falls in the nonrejection region Hence, we fail to reject the null
hypothesis
50
Example 11-7
A researcher wanted to study the relationship between gender and owning cell phones. She took a sample of 2000 adults and obtained the information given in the following table.
51
Example 11-7
At the 5% level of significance, can you conclude that gender and owning cell phones are related for all adults?
Own Cell Phones
Do Not Own Cell Phones
Men Women
640440
450470
52
Solution 11-7
H0: Gender and owning a cell phone are not related
H1: Gender and owning a cell phone are related
53
Solution 11-7
We are performing a test of independence
We use the chi-square distribution α = .05. df = (R – 1)(C – 1) = (2 – 1)(2 – 1) = 1 The critical value of χ2 = 3.841
54
Figure 11.7
Reject H0 Do not reject H0
α = .05
3.841 χ2
Critical value of χ2
55
Table 11.9
Own Cell Phones (Y)
Do Not Own Cell Phones
(N)
Row Total
s
Men (M)
640(588.60)
450(501.40)
1090
Women (W)
440(491.40)
470(418.60)
910
Column Totals
1080 920 2000
56
Solution 11-7
445.21311.6376.5269.5489.4 60.481
60.418470
40.491
40.491440
40.501
40.501450
60.588
60.588640
)(
22
22
22
E
EO
57
Solution 11-7
The value of the test statistic χ2 = 21.445 It is larger than the critical value of χ2 It falls in the rejection region
Hence, we reject the null hypothesis
58
A Test of Homogeneity
Definition A test of homogeneity involves
testing the null hypothesis that the proportions of elements with certain characteristics in two or more different populations are the same against the alternative hypothesis that these proportions are not the same.
59
Example 11-8
Consider the data on income distributions for households in California and Wisconsin given in following table:Californi
aWisconsi
nRow
Totals
High Income 70 34 104
Medium Income
80 40 120
Low Income 100 76 176
Column Totals 250 150 400
60
Example 11-8
Using the 2.5% significance level, test the null hypothesis that the distribution of households with regard to income levels is similar (homogeneous) for the two states.
61
Solution 11-8
H0: The proportions of households that belong to different income
groups are the same in both states
H1: The proportions of households that belong to different income
groups are not the same in both states
62
Solution 11-8
α = .025 df = (R – 1)(C – 1) = (3 – 1)(2 – 1) = 2 The critical value of χ2 = 7.378
63
Figure 11.7
Reject H0 Do not reject H0
α = .025
7.378 χ2
Critical value of χ2
64
Table 11.11
California Wisconsin Row Totals
High income 70
(65)34
(39)104
Medium income80
(75)40
(45)120
Low income100
(110)76
(66)176
Column Totals 250 150 400
65
Solution 11-8
339.4515.1909.566.333.641.385. 66
6676
110
110100
45
4540
75
7580
39
3934
65
6570
)(
222
222
22
E
EO
66
Solution 11-8
The value of the test statistic χ2 = 4.339 It is less than the critical value of χ2
It falls in the nonrejection region Hence, we fail to reject the null
hypothesis
67
INFERENCES ABOUT THE POPULATION VARIANCE
Estimation of the Population Variance Hypothesis Tests About the
Population Variance
68
INFERENCES ABOUT THE POPULATION VARIANCE cont.
Sampling Distribution of (n – 1)s2 / σ2
If the population from which the sample is selected is (approximately) normally distributed, then
has a chi-square distribution with n – 1 degrees of freedom.
2
2)1(
sn
69
Estimation of the Population Variance
Assuming that the population from which the sample is selected is (approximately) normally distributed, the (1 – α)100% confidence interval for the population variance σ2 is
2
2
22/
2
2/1
)1( to
)1(
snsn
70
Example 11-9 One type of cookie manufactured by
Haddad Food Company is Cocoa Cookies. The machine that fills packages of these cookies is set up in such a way that the average net weight of these packages is 32 ounces with a variance of .015 square ounce.
71
Example 11-9 From time to time the quality control
inspector at the company selects a sample of a few such packages, calculates the variance of the net weights of these packages, and construct a 95% confidence interval for the population variance. If either both or one of the two limits of this confidence interval is not the interval .008 to .030, the machine is stopped and adjusted.
72
Example 11-9 A recently taken random sample of 25
packages from the production line gave a sample variance of .029 square ounce. Based on this sample information, do you think the machine needs an adjustment? Assume that the net weights of cookies in all packages are normally distributed.
73
Solution 11-9 n = 25 s2 = .029 α = 1 - .95 = .05 α / 2 = .05 / 2 = .025 1 – α / 2 = 1 – .025 = .975 df = n – 1 = 25 – 1 = 24 χ2 for 24 df and .025 area in the right tail =
39.364 χ2 for 24 df and .975 area in the right tail =
12.401
74
Figure 11.9
df = 24
2
= .025
39.364
Value of 2
2/χ2
75
Figure 11.9
df = 24
21
= .025
12.401
Value of 221
χ2
76
Solution 11-9
.0561 to.0177 12.401
)029)(.125( to
364.39
)029)(.125(
)1( to
)1(
2
2
22/
2
2/1
snsn
77
Solution 11-9
Thus, with 95% confidence, we can state that the variance for all packages of Cocoa Cookies lies between .0177 and .0561 square ounce.
78
Hypothesis Tests About the Population Variance
The value of the test statistic χ2 is calculated as
where s2 is the sample variance, σ2 is the hypothesized value of the population variance, and n – 1 represents the degrees of freedom. The population from which the sample is selected is assumed to be (approximately) normally distributed.
2
22 )1(
sn
79
Example 11-10 One type of cookie manufactured by Haddad
Food Company is Cocoa Cookies. The machine that fills packages of these cookies is set up in such a way that the average net weight of these packages is 32 ounces with a variance of .015 square ounce. From time to time the quality control inspector at the company selects a sample of a few such packages, calculates the variance of the net weights of these packages, and makes a test of hypothesis about the population variance.
80
Example 11-10
She always uses α = .01. The acceptable value of the population variance is .015 square ounce or less. If the conclusion from the test of hypothesis is that the population variance is not within the acceptable limit, the machine is stopped and adjusted.
81
Example 11-10 A recently taken random sample of
25 packages from the production line gave a sample variance of .029 square ounce. Based on this sample information, do you think the machine needs an adjustment? Assume that the net weights of cookies in all packages are normally distributed.
82
Solution 11-10
H0 :σ2 ≤ .015 The population variance is within the
acceptable limit H1: σ2 >.015
The population variance exceeds the acceptable limit
83
Solution 11-10
α = .01 df = n – 1 = 25 – 1 = 24 The critical value of χ2 = 42.980
84
Figure 11.10
Reject H0 Do not reject H0
α = .01
42.980
χ2
Critical value of χ2
85
Solution 11-10
400.46015.
)029)(.125()1(2
22
sn
From H0
86
Solution 11-10 The value of the test statistic χ2 = 46.400
It is greater than the critical value of χ 2
It falls in the rejection region Hence, we reject the null hypothesis H0
We conclude that the population variance is not within the acceptable limit The machine should be stopped and adjusted
87
Example 11-11 The variance of scores on a standardized
mathematics test for all high school seniors was 150 in 2002. A sample of scores for 20 high school seniors who took this test this year gave a variance of 170. Test at the 5% significance level if the variance of current scores of all high school seniors on this test is different from 150. Assume that the scores of all high school seniors on this test are (approximately) normally distributed.
88
Solution 11-11
H0: σ2 = 150 The population variance is not different
from 150 H1: σ2 ≠ 150
The population variance is different from 150
89
Solution 11-11
α = .05 Area in the each tail = .025 df = n – 1 = 20 – 1 = 19 The critical values of χ2 32.852 and
8.907
90
Figure 11.11
Reject H0Do not reject H0
Reje
ct H
0
α /2 = .025α /2 = .025
8.907 32.852
Two critical values of χ2
91
Solution 11-11
533.21150
)170)(120()1(2
22
sn
From H0
92
Solution 11-11
The value of the test statistic χ2 = 21.533 It is between the two critical values of χ2
It falls in the nonrejection region Consequently, we fail to reject H0.