chi-squared tests

67
Chi-squared Tests

Upload: sulwyn

Post on 05-Jan-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Chi-squared Tests. We want to test the “goodness of fit” of a particular theoretical distribution to an observed distribution. The procedure is:. 1. Set up the null and alternative hypotheses and select the significance level. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chi-squared Tests

Chi-squared Tests

Page 2: Chi-squared Tests

We want to test the “goodness of fit” of a particular theoretical distribution to an

observed distribution. The procedure is:

1. Set up the null and alternative hypotheses and select the significance level.

2. Draw a random sample of observations from a population or process.

3. Derive expected frequencies under the assumption that the null hypothesis is true.

4. Compare the observed frequencies and the expected frequencies.

5. If the discrepancy between the observed and expected frequencies is too great to attribute to chance fluctuations at the selected significance level, reject the null hypothesis.

Page 3: Chi-squared Tests

Example 1: Five brands of coffee are taste-tested by 1000 people with the results below. Test at the 5% level the hypothesis that, in the general population, there is no difference in the proportions preferring each brand (i.e.: H0: pA= pB= pC= pD= pE versus H1: not all the proportions are the same).

Brand preference

Observed frequency fo

Theoretical frequency ft

fo-ft (fo-ft)2

A 210

B 312

C 170

D 85

E 223

1000

t

2to

f

)f-(f

Page 4: Chi-squared Tests

If all the proportions were the same, we’d expect about 200 people in each group, if we have a total of 1000 people.

Brand preference

Observed frequency fo

Theoretical frequency ft

fo-ft (fo-ft)2

A 210 200

B 312 200

C 170 200

D 85 200

E 223 200

1000 1000

t

2to

f

)f-(f

Page 5: Chi-squared Tests

We next compute the differences in the observed and theoretical frequencies.

Brand preference

Observed frequency fo

Theoretical frequency ft

fo-ft (fo-ft)2

A 210 200 10

B 312 200 112

C 170 200 -30

D 85 200 -115

E 223 200 23

1000 1000

t

2to

f

)f-(f

Page 6: Chi-squared Tests

Then we square each of those differences.

Brand preference

Observed frequency fo

Theoretical frequency ft

fo-ft (fo-ft)2

A 210 200 10 100

B 312 200 112 12544

C 170 200 -30 900

D 85 200 -115 13225

E 223 200 23 539

1000 1000

t

2to

f

)f-(f

Page 7: Chi-squared Tests

Then we divide each of the squares by the expected frequency and add the quotients.The resulting statistic has a chi-squared (2) distribution.

Brand preference

Observed frequency fo

Theoretical frequency ft

fo-ft (fo-ft)2

A 210 200 10 100 0.500

B 312 200 112 12544 62.720

C 170 200 -30 900 4.500

D 85 200 -115 13225 66.125

E 223 200 23 539 2.645

1000 1000 136.49

t

2to

f

)f-(f

Page 8: Chi-squared Tests

The chi-squared (2) distribution

2

f(2)

The chi-squared distribution is skewed to the right. (i.e.: It has the bump on the left and the tail on the right.)

Page 9: Chi-squared Tests

In these goodness of fit problems, the number of degrees of freedom is:

estimated. parameters of # theminus

nsrestrictio of # theminus

classesor categories of # dof

In the current problem, we have 5 categories (the 5 brands).

We have 1 restriction. When we determined our expected frequencies, we restricted our numbers so that the total would be the same total as for the observed frequencies (1000).

We didn’t estimate any parameters in this particular problem.

So dof = 5 – 1 – 0 = 4 .

Page 10: Chi-squared Tests

Large values of the 2 statistic indicate big discrepancies between the observed and theoretical frequencies.

2

f(2)

So when the 2 statistic is large, we reject the hypothesis that the theoretical distribution is a good fit.

That means the critical region consists of the large values, the right tail.

acceptance region

crit. reg.

Page 11: Chi-squared Tests

f(2)

From the 2 table, we see that for a 5% test with 4 degrees of freedom, the cut-off point is 9.488.

In the current problem, our 2 statistic had a value of 136.49.

So we reject the null hypothesis and conclude that the proportions preferring each brand were not the same.

acceptance region

crit. reg.

0.05

9.488 136.4924

Page 12: Chi-squared Tests

Example 2: A diagnostic test of mathematics is given to a group of 1000 students. The administrator analyzing the results wants to know if the scores of this group differ significantly from those of the past. Test at the 10% level.

GradeHistorical Rel. freq.

Expected Abs. freq. ft

CurrentObs. freq. fo

fo-ft (fo-ft)2

90-100 0.10 50

80-89 0.20 100

70-79 0.40 500

60-69 0.20 200

<60 0.10 150

1000

t

2to

f

)f-(f

Page 13: Chi-squared Tests

GradeHistorical Rel. freq.

Expected Abs. freq. ft

CurrentObs. freq. fo

fo-ft (fo-ft)2

90-100 0.10 50

80-89 0.20 100

70-79 0.40 500

60-69 0.20 200

<60 0.10 150

1000

t

2to

f

)f-(f

one. historical as same NOT ison distributi frequency current :H

one historical as same ison distributi frequency current :H

:Hypotheses

1

0

Page 14: Chi-squared Tests

Based on the historical relative frequency, we determine the expected absolute frequency, restricting the total to the total for the current observed frequency.

GradeHistorical Rel. freq.

Expected Abs. freq. ft

CurrentObs. freq. fo

fo-ft (fo-ft)2

90-100 0.10 100 50

80-89 0.20 200 100

70-79 0.40 400 500

60-69 0.20 200 200

<60 0.10 100 150

1000 1000

t

2to

f

)f-(f

Page 15: Chi-squared Tests

We subtract the theoretical frequency from the observed frequency.

GradeHistorical Rel. freq.

Expected Abs. freq. ft

CurrentObs. freq. fo

fo-ft (fo-ft)2

90-100 0.10 100 50 -50

80-89 0.20 200 100 -100

70-79 0.40 400 500 100

60-69 0.20 200 200 0

<60 0.10 100 150 50

1000 1000

t

2to

f

)f-(f

Page 16: Chi-squared Tests

We square those differences.

GradeHistorical Rel. freq.

Expected Abs. freq. ft

CurrentObs. freq. fo

fo-ft (fo-ft)2

90-100 0.10 100 50 -50 2500

80-89 0.20 200 100 -100 10,000

70-79 0.40 400 500 100 10,000

60-69 0.20 200 200 0 0

<60 0.10 100 150 50 2500

1000 1000

t

2to

f

)f-(f

Page 17: Chi-squared Tests

We divide the square by the theoretical frequency and sum up.

GradeHistorical Rel. freq.

Expected Abs. freq. ft

CurrentObs. freq. fo

fo-ft (fo-ft)2

90-100 0.10 100 50 -50 2500 25

80-89 0.20 200 100 -100 10,000 50

70-79 0.40 400 500 100 10,000 25

60-69 0.20 200 200 0 0 0

<60 0.10 100 150 50 2500 25

1000 1000 125

t

2to

f

)f-(f

Page 18: Chi-squared Tests

estimated. parameters of # theminus

nsrestrictio of # theminus

classesor categories of # dof

We have 5 categories (the 5 grade groups).

We have 1 restriction. We restricted our expected frequencies so that the total would be the same total as for the observed frequencies (1000).

We didn’t estimate any parameters in this particular problem.

So dof = 5 – 1 – 0 = 4 .

Page 19: Chi-squared Tests

f(2)

From the 2 table, we see that for a 10% test with 4 degrees of freedom, the cut-off point is 7.779.

In the current problem, our 2 statistic had a value of 125.

So we reject the null hypothesis and conclude that the grade distribution is NOT the same as it was historically.

acceptance region

crit. reg.

0.10

7.779 12524

Page 20: Chi-squared Tests

Example 3: Test at the 5% level whether the demand for a particular product as listed below has a Poisson distribution.

# of units demanded per day x

Observed # of days fo

xfoprob. f(x)

Expected # of days ft

fo-ft (fo-ft)2

0 11

1 28

2 43

3 47

4 32

5 28

6 7

7 0

8 2

9 1

10 1

200

t

2to

f

)f-(f

Page 21: Chi-squared Tests

Multiplying the number of days on which each amount was sold by the amount sold on that day, and then adding those products, we find that the total number of units sold on the 200 days is 600. So the mean number of units sold per day is 3.

# of units demanded per day x

Observed # of days fo

xfoprob. f(x)

Expected # of days ft

fo-ft (fo-ft)2

0 11 0

1 28 28

2 43 86

3 47 141

4 32 128

5 28 140

6 7 42

7 0 0

8 2 16

9 1 9

10 1 10

200 600

t

2to

f

)f-(f

Page 22: Chi-squared Tests

We use the 3 as the estimated mean for the Poisson distribution. Then using the Poisson table, we determine the probabilities for each x value.

# of units demanded per day x

Observed # of days fo

xfoprob. f(x)

Expected # of days ft

fo-ft (fo-ft)2

0 11 0 0.050

1 28 28 0.149

2 43 86 0.224

3 47 141 0.224

4 32 128 0.168

5 28 140 0.101

6 7 42 0.050

7 0 0 0.022

8 2 16 0.008

9 1 9 0.003

10 1 10 0.001

200 600 1.

t

2to

f

)f-(f

Page 23: Chi-squared Tests

Then we multiply the probabilities by 200 to compute ft, the expected number of days on which each number of units would be sold. By multiplying by 200, we restrict the ft total to be the same as the fo total.

# of units demanded per day x

Observed # of days fo

xfoprob. f(x)

Expected # of days ft

fo-ft (fo-ft)2

0 11 0 0.050 10.0

1 28 28 0.149 29.8

2 43 86 0.224 44.8

3 47 141 0.224 44.8

4 32 128 0.168 33.6

5 28 140 0.101 20.2

6 7 42 0.050 10.0

7 0 0 0.022 4.4

8 2 16 0.008 1.6

9 1 9 0.003 0.6

10 1 10 0.001 0.2

200 600 1. 200

t

2to

f

)f-(f

Page 24: Chi-squared Tests

When the ft’s are small (less than 5), the test is not reliable. So we group small ft values. In this example, we group the last 4 categories.

# of units demanded per day x

Observed # of days fo

xfoprob. f(x)

Expected # of days ft

fo-ft (fo-ft)2

0 11 0 0.050 10.0

1 28 28 0.149 29.8

2 43 86 0.224 44.8

3 47 141 0.224 44.8

4 32 128 0.168 33.6

5 28 140 0.101 20.2

6 7 42 0.050 10.0

7 0

4

0 0.022 4.4

6.88 2 16 0.008 1.6

9 1 9 0.003 0.6

10 1 10 0.001 0.2

200 600 200

t

2to

f

)f-(f

Page 25: Chi-squared Tests

Next we subtract the theoretical frequencies ft from the observed frequencies fo.

# of units demanded per day x

Observed # of days fo

xfoprob. f(x)

Expected # of days ft

fo-ft (fo-ft)2

0 11 0 0.050 10.0 1

1 28 28 0.149 29.8 1.8

2 43 86 0.224 44.8 -1.8

3 47 141 0.224 44.8 2.2

4 32 128 0.168 33.6 -1.6

5 28 140 0.101 20.2 7.8

6 7 42 0.050 10.0 -3

7 0

4

0 0.022 4.4

6.8 -2.88 2 16 0.008 1.6

9 1 9 0.003 0.6

10 1 10 0.001 0.2

200 600 200

t

2to

f

)f-(f

Page 26: Chi-squared Tests

Then we square the differences …

# of units demanded per day x

Observed # of days fo

xfoprob. f(x)

Expected # of days ft

fo-ft (fo-ft)2

0 11 0 0.050 10.0 1 1

1 28 28 0.149 29.8 1.8 3.24

2 43 86 0.224 44.8 -1.8 3.24

3 47 141 0.224 44.8 2.2 4.84

4 32 128 0.168 33.6 -1.6 2.5

5 28 140 0.101 20.2 7.8 60.84

6 7 42 0.050 10.0 -3 9

7 0

4

0 0.022 4.4

6.8 -2.8 7.848 2 16 0.008 1.6

9 1 9 0.003 0.6

10 1 10 0.001 0.2

200 600 200

t

2to

f

)f-(f

Page 27: Chi-squared Tests

… divide by the theoretical frequencies, and sum up.

# of units demanded per day x

Observed # of days fo

xfoprob. f(x)

Expected # of days ft

fo-ft (fo-ft)2

0 11 0 0.050 10.0 1 1 0.10

1 28 28 0.149 29.8 1.8 3.24 0.11

2 43 86 0.224 44.8 -1.8 3.24 0.07

3 47 141 0.224 44.8 2.2 4.84 0.11

4 32 128 0.168 33.6 -1.6 2.5 0.08

5 28 140 0.101 20.2 7.8 60.84 3.01

6 7 42 0.050 10.0 -3 9 0.90

7 0

4

0 0.022 4.4

6.8 -2.8 7.84 1.158 2 16 0.008 1.6

9 1 9 0.003 0.6

10 1 10 0.001 0.2

200 600 200 5.53

t

2to

f

)f-(f

Page 28: Chi-squared Tests

estimated. parameters of # theminus

nsrestrictio of # theminus

classesor categories of # dof

We have 8 categories (after grouping the small ones).

We have 1 restriction. We restricted our expected frequencies so that the total would be the same total as for the observed frequencies (200).

We estimated 1 parameter, the mean for the Poisson distribution.

So dof = 8 – 1 – 1 = 6 .

Page 29: Chi-squared Tests

f(2)

From the 2 table, we see that for a 5% test with 6 degrees of freedom, the cut-off point is 12.592.

In the current problem, our 2 statistic had a value of 5.53.

So we accept the null hypothesis that the Poisson distribution is a reasonable fit for the product demand.

acceptance region

crit. reg.

0.05

12.5925.5326

Page 30: Chi-squared Tests

Example 4: Test at the 10% level whether the following exam grades are from a normal distribution.

Note: This is a very long problem.

grade intervals

midpoint

Xfo X fo

[50, 60) 14

[60,70) 18

[70,80) 36

[80.90) 18

[90,100] 14

100

X - X2)X - (X o

2 f)X - (X

Page 31: Chi-squared Tests

If the distribution is normal, we need to estimate its mean and standard deviation.

grade intervals

midpoint

Xfo X fo

[50, 60) 14

[60,70) 18

[70,80) 36

[80.90) 18

[90,100] 14

100

X - X2)X - (X o

2 f)X - (X

Page 32: Chi-squared Tests

To estimate the mean, we first determine the midpoints of the grade intervals.

grade intervals

midpoint

Xfo X fo

[50, 60) 55 14

[60,70) 65 18

[70,80) 75 36

[80.90) 85 18

[90,100] 95 14

100

X - X2)X - (X o

2 f)X - (X

Page 33: Chi-squared Tests

We then multiple these midpoints by the observed frequencies of the intervals, add the products, and divide the sum by the number of observations.The resulting mean is 7500/100 = 75.

grade intervals

midpoint

Xfo X fo

[50, 60) 55 14 770

[60,70) 65 18 1170

[70,80) 75 36 2700

[80.90) 85 18 1530

[90,100] 95 14 1330

100 7500

X - X2)X - (X o

2 f)X - (X

Page 34: Chi-squared Tests

Next we need to calculate the standard deviation

We begin by subtracting the mean of 75 from each midpoint, and squaring the differences.

grade intervals

midpoint

Xfo X fo

[50, 60) 55 14 770 -20 400

[60,70) 65 18 1170 -10 100

[70,80) 75 36 2700 0 0

[80.90) 85 18 1530 10 100

[90,100] 95 14 1330 20 400

100 7500

X - X2)X - (X o

2 f)X - (X

. 1-n

f)X(X s

o

c

1i

2i

Page 35: Chi-squared Tests

We multiply by the observed frequencies and sum up.Dividing by n –1 or 99, the sample variance s2 = 149.49495.The square root is the sample standard deviation s = 12.2268.

grade intervals

midpoint

Xfo X fo

[50, 60) 55 14 770 -20 400 5600

[60,70) 65 18 1170 -10 100 1800

[70,80) 75 36 2700 0 0 0

[80.90) 85 18 1530 10 100 1800

[90,100] 95 14 1330 20 400 5600

100 7500 14,800

X - X 2)X - (X o2 f)X - (X

1-n

f)X(X s

c

1io

2i

Page 36: Chi-squared Tests

We will use the 75 and 12.2268 as the mean and the standard deviation of our proposed normal distribution.

We now need to determine what the expected frequencies would be if the grades were from that normal distribution.

Page 37: Chi-squared Tests

Start with our lowest grade category, under 60.

2268.12

7560-XPr 60)(XPr

23.1Pr Z

1093.00.3907 - 0.5

We then expect that 10.93% of our 100 observations, or about 11 grades, would be in the lowest grade category.

So 11 will be one of our ft values.

We need to do similar calculations for our other grade categories.

0

.1093

-1.23

.3907

Z

Page 38: Chi-squared Tests

The next grade category is [60,70).

2268.12

7570-X

2268.12

7560Pr 0)7X(60Pr

41.023.1Pr Z 2316.00.1591-0.3907

So 23.16% of our 100 observations, or about 23 grades, are expected to be in that grade category.

0-1.23 -0.41

.3907

Z

.1591

Page 39: Chi-squared Tests

The next grade category is [70,80).

2268.12

7580-X

2268.12

7570Pr 0)8X(70Pr

41.041.0Pr Z 3182.02(0.1591)

So 31.82% of our 100 observations, or about 32 grades, are expected to be in that grade category.

0-0.41

.1591

Z

.1591

0.41

Page 40: Chi-squared Tests

The next grade category is [80,90).

2268.12

7590-X

2268.12

7580Pr 0)9X(80Pr

23.141.0Pr Z

So 23.16% of our 100 observations, or about 23 grades, are expected to be in that grade category.

0 0.41 1.23

.3907

Z

.1591

2316.00.1591-0.3907

Page 41: Chi-squared Tests

The highest grade category is 90 and over.

2268.12

7590-XPr 0)9(XPr

23.1Pr Z

So 10.93% of our 100 observations, or about 11 grades, are expected to be in that grade category.

1093.00.3907 - 0.5

0

.1093

1.23

.3907

Z

Page 42: Chi-squared Tests

Now we can finally compute our 2 statistic.

We put in the observed frequencies that we were given and the theoretical frequencies that we just calculated.

grade category

fo ft

under 60 14 11

[60,70) 18 23

[70,80) 36 32

[80.90) 18 23

90 and up 14 11

t

2to

f

)f-(f

Page 43: Chi-squared Tests

We subtract the theoretical frequencies from the observed frequencies, square the differences, divide by the theoretical frequencies, and sum up. The resulting 2 statistic is 4.3104.

grade category

fo ft

under 60 14 11 0.8182

[60,70) 18 23 1.0870

[70,80) 36 32 0.5000

[80.90) 18 23 1.0870

90 and up 14 11 0.8182

4.3104

t

2to

f

)f-(f

Page 44: Chi-squared Tests

estimated. parameters of # theminus

nsrestrictio of # theminus

classesor categories of # dof

We have 5 categories (the 5 grade groups).

We have 1 restrictions. We restricted our expected frequencies so that the total would be the same total as for the observed frequencies (100).

We estimated two parameters, the mean and the standard deviation.

So dof = 5 – 1 – 2 = 2 .

Page 45: Chi-squared Tests

f(2)

From the 2 table, we see that for a 10% test with 2 degrees of freedom, the cut-off point is 4.605.

In the current problem, our 2 statistic had a value of 4.31.

So we accept the null hypothesis that the normal distribution is a reasonable fit for the grades.

acceptance region

crit. reg.

0.10

4.6054.31 22

Page 46: Chi-squared Tests

We can also use the 2 statistic to test whether two variables

are independent of each other.

Page 47: Chi-squared Tests

Example 5: Given the following frequencies for a sample of 10,000 households, test at the 1% level whether the number of phones and the number of cars for a household are independent of each other.

# of cars

0 1 2

# of phones

0 1,000 900 100

1 1500 2600 500

2 or more

500 2500 400

10,000

Page 48: Chi-squared Tests

We first compute the row and column totals,

# of cars

0 1 2row total

# of phones

0 1,000 900 100 2000

1 1500 2600 500 4600

2 or more

500 2500 400 3400

column total

3,000 6,000 1,000 10,000

Page 49: Chi-squared Tests

and the row and column percentages (marginal probabilities).

# of cars

0 1 2row total

%

# of phones

0 1,000 900 100 2000 0.20

1 1500 2600 500 4600 0.46

2 or more

500 2500 400 3400 0.34

column total

3,000 6,000 1,000 10,000 1.00

% 0.30 0.60 0.10 1.00

Page 50: Chi-squared Tests

Recall that if 2 variables X and Y are independent of each other,

then Pr(X=x and Y=y) = Pr(X=x) Pr(Y=y)

Page 51: Chi-squared Tests

We can use our row and column percentages as marginal probabilities, and multiply to determine the probabilities and numbers of households we would expect to see in the center of the table if the numbers of phones and cars were independent of each other.

# of cars

0 1 2row total

%

# of phones

0 0.20

1 0.46

2 or more

0.34

column total

1.00

% 0.30 0.60 0.10 1.00

Page 52: Chi-squared Tests

First calculate the expected probability. For example, Pr(0 phones & 0 cars) = Pr(0 phones) Pr(0 cars) = (0.20)(0.30) = 0.06.So we expect 6% of our 10,000 households, or 600 households to have 0 phones and 0 cars.

# of cars

0 1 2row total

%

# of phones

0 600 0.20

1 0.46

2 or more

0.34

column total

10,000 1.00

% 0.30 0.60 0.10 1.00

Page 53: Chi-squared Tests

Pr(0 phones & 1 car) = Pr(0 phones) Pr(1 car) = (0.20)(0.60) = 0.12.So we expect 12% of our 10,000 households, or 1200 households to have 0 phones and 1 car.

# of cars

0 1 2row total

%

# of phones

0 600 1200 0.20

1 0.46

2 or more

0.34

column total

10,000 1.00

% 0.30 0.60 0.10 1.00

Page 54: Chi-squared Tests

Pr(0 phones & 2 cars) = Pr(0 phones) Pr(2 cars) = (0.20)(0.10) = 0.02.So we expect 2% of our 10,000 households, or 200 households to have 0 phones and 2 cars.

# of cars

0 1 2row total

%

# of phones

0 600 1200 200 0.20

1 0.46

2 or more

0.34

column total

10,000 1.00

% 0.30 0.60 0.10 1.00

Page 55: Chi-squared Tests

Notice that when we add the 3 numbers that we just calculated, we get the same total for the row (2,000) that we had observed. The row and column totals should be the same for the observed and expected tables.

# of cars

0 1 2row total

%

# of phones

0 600 1200 200 2,000 0.20

1 0.46

2 or more

0.34

column total

10,000 1.00

% 0.30 0.60 0.10 1.00

Page 56: Chi-squared Tests

Continuing, we get the following numbers for the 2nd and 3rd rows.

# of cars

0 1 2row total

%

# of phones

0 600 1200 200 2,000 0.20

1 1380 2760 460 4600 0.46

2 or more

1020 2040 340 3400 0.34

column total

10,000 1.00

% 0.30 0.60 0.10 1.00

Page 57: Chi-squared Tests

The column totals are the same as for the observed table.

# of cars

0 1 2row total

%

# of phones

0 600 1200 200 2,000 0.20

1 1380 2760 460 4600 0.46

2 or more

1020 2040 340 3400 0.34

column total

3000 6000 1000 10,000 1.00

% 0.30 0.60 0.10 1.00

Page 58: Chi-squared Tests

Now we set up the same type of table that we did for our earlier 2 goodness-of-fit tests. We put in the fo column the observed frequencies and in the ft column the expected frequencies that we calculated.

# of cars # of phones fo ft

0 0 1000 600

0 1 1500 1380

0 2 or more 500 1020

1 0 900 1200

1 1 2600 2760

1 2 or more 2500 2040

2 0 100 200

2 1 500 460

2 2 or more 400 340

t

2to

f

)f-(f

Page 59: Chi-squared Tests

Then we subtract the theoretical frequencies from the observed frequencies, square the differences, divide by the theoretical frequencies, and sum to get our 2 statistic.

# of cars # of phones fo ft

0 0 1000 600 266.67

0 1 1500 1380 10.43

0 2 or more 500 1020 265.10

1 0 900 1200 75.00

1 1 2600 2760 9.28

1 2 or more 2500 2040 103.73

2 0 100 200 50.00

2 1 500 460 3.48

2 2 or more 400 340 10.59

794.28

t

2to

f

)f-(f

Page 60: Chi-squared Tests

1).- columns of 1)(#- rows of (# 1)-1)(c-(r dof

In our example, we have 3 rows and 3 columns.

So dof = (3 – 1)( 3 – 1) = (2)(2) = 4 .

In these tests of independence, the number of degrees of freedom is

Page 61: Chi-squared Tests

f(2)

From the 2 table, we see that for a 1% test with 4 degrees of freedom, the cut-off point is 13.277.

In the current problem, our 2 statistic had a value of 794.28.

So we reject the null hypothesis and conclude that the number of phones and the number of cars in a household are not independent.

acceptance region

crit. reg.

0.01

13.277 794.2824

Page 62: Chi-squared Tests

In testing for independence in 2x2 tables, the chi-square statistic has only (r-1)(c-1) =1 degree of freedom. In these cases, it is often recommended that the value of the statistic be “corrected” so that its discrete distribution will be better approximated by the continuous chi-square distribution.

t

2to

f

50ff

statistic squared-chi corrected-Yates

).|(|

Yates Correction

Page 63: Chi-squared Tests

The Hypothesis Test for the Variance or Standard Deviation

This test is another one that uses the chi-squared distribution.

Page 64: Chi-squared Tests

Sometimes it is important to know the variance or standard deviation of a variable.

For example, medication often needs to be extremely close to the specified dosage.

If the dosage is too low, the medication may be ineffective and a patient may die from inadequate treatment.

If the dosage is too high, the patient may die from an overdose.

So you may want to make sure that the variance is a very small amount.

Page 65: Chi-squared Tests

If the data are normally distributed, the chi-squared test for the variance or standard deviation is appropriate.

22

2

( 1)n s

The statistic is

n is the sample size, and σ2 is the hypothesized population variance.

The number of degrees of freedom is n-1.

22

1

( )where is the sample variance,

1

ni

i

X Xs

n

Page 66: Chi-squared Tests

Example: Suppose you want to test at the 5% level whether the population standard deviation for a particular medication is 0.5 mg. Based on a sample of 25 capsules, you determine the sample standard deviation to be 0.6 mg. Perform the test.

22

2

( 1) 24(0.36)34.56

0.25

n s

Now we need to determine the critical region for the test.

Page 67: Chi-squared Tests

Because the chi-squared distribution is not symmetric, you need to look up the two critical values for a two-tailed test separately.

0.025

0.025

12.401 39.364

critical region

acceptance region

critical region

You can find the two numbers either by looking under “Cumulative Probabilities” 0.025 and 1-0.025=0.975 or under “Upper-Tail Areas” 0.975 and 0.025 .

Recall that the value of the test statistic was 34.56, which is in the acceptance region.

So we can not rule out the null hypothesis and therefore we conclude that the population standard deviation is 0.5 mg.