ib math hl - charlotte county public schools · categorical data chi square tests are used for when...

105
Chapter 25 Comparing Counts

Upload: others

Post on 27-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Chapter 25

Comparing Counts

Page 2: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Objectives

Chi-Square Model

Chi-Square Statistic

Knowing when and how to use the Chi-

Square Tests;

Goodness of Fit

Test of Independence

Test of Homogeneity

Standardized Residual

Page 3: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Categorical Data

Chi Square tests are used for when we have countsfor the categories of a categorical variable:

Goodness of Fit Test

Allows us to test whether a certain population distribution seems valid. This is a one variable, one sample test

Test of Independence

Cross categorizing one group on two-variables to see if there is an association between variables. This is a two variable, one sample test.

Test for Homogeneity

Compares observed distribution for several groups to each other to see if there is a difference among the population. This is a one variable, many samples test.

Page 4: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Chi Square Model

Just like the student t-models, chi square has a family

of models depending on degrees of freedom.

Unlike the student t-models, a chi square distribution

is not symmetric. It’s skewed right.

A chi square test statistic is always a one-sided, right-

tailed test.

Page 5: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

The Chi-Square ( 2 ) Distribution - Properties

It is a continuous distribution.

It is not symmetric.

It is skewed to the right.

The distribution depends on the degrees of freedom.

The value of a 2 random variable is always

nonnegative.

There are infinitely many 2 distributions,

since each is uniquely defined by its degrees

of freedom.

Page 6: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

The Chi-Square ( 2 ) Distribution - Properties

For small sample size, the 2 distribution is

very skewed to the right.

As n increases, the 2 distribution becomes

more and more symmetrical.

Page 7: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

The Chi-Square ( 2 ) Distribution - Properties

Since we will be using the 2 distribution for

the tests in this chapter, we will need to be

able to find critical values associated with the

distribution.

Page 8: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Critical Value

• Since we will be using the 2 distribution for the tests

in this chapter, we will need to be able to find critical

values associated with the distribution.

• Explanation of the term – critical or rejection region: A

critical or rejection region is a range of test statistic

values for which the null hypothesis will be rejected.

• This range of values will indicate that there is a

significant or large enough difference between the

postulated parameter value and the corresponding

point estimate for the parameter.

Page 9: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Critical Value

• Explanation of the term – non-critical or non-rejection region: A non-critical or non-rejection region is a range of test statistic values for which the null hypothesis will not be rejected.

• This range of values will indicate that there is not a significant or large enough difference between the postulated parameter value and the corresponding point estimate for the parameter.

Page 10: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Critical Value

Non-Critical Region(Non-Rejection Region)

(Rejection Region)

Page 11: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

The Chi-Square ( 2 ) Distribution - Properties

Notation: 2, df

Explanation of the notation 2, df: 2

, df is a

2 value with n degrees of freedom such that

(the significance level) area is to the right of

the corresponding 2 value.

Page 12: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

The Chi-Square ( 2 ) Distribution - Properties

Diagram explaining thenotation2

, df

Page 13: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

The Chi-Square ( 2 ) Distribution - Table

Page 14: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

The Chi-Square ( 2 ) Distribution - Table

Values for the random variable with the

appropriate degrees of freedom can be

obtained from the tables in the formula

booklet.

Example: What is the value of 20.05,10?

Page 15: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

The Chi-Square ( 2 ) Distribution - Table

α=.05

df=10

χ2 critical value

Page 16: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

The Chi-Square ( 2 ) Distribution - Table

Solution: From Table in the formula booklet, 2

0.05,10 = 18.307.

Page 17: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

The Chi-Square ( 2 ) Distribution - Table

Your Turn: What is the value of 20.10,20?

Page 18: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

The Chi-Square ( 2 ) Distribution - Table

20.10,20 = 28.41

Page 19: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

CHI-SQUARE (2) TEST

FOR GOODNESS OF FIT

Page 20: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Goodness-of-Fit

A test of whether the distribution of counts in

one categorical variable matches the

distribution predicted by a model is called a

goodness-of-fit test.

As usual, there are assumptions and

conditions to consider…

Page 21: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Assumptions and Conditions

Counted Data Condition: Check that the data

are counts for the categories of a categorical

variable.

Independence Assumption: The counts in

the cells should be independent of each

other.

Randomization Condition: The individuals who

have been counted and whose counts are

available for analysis should be a random

sample from some population.

Page 22: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Assumptions and Conditions

Sample Size Assumption: We must have

enough data for the methods to work.

Expected Cell Frequency Condition: We

should expect to see at least 5 individuals in

each cell.

This is similar to the condition that np

and nq be at least 10 when we tested

proportions.

Page 23: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Calculations

Since we want to examine how well the

observed data reflect what would be

expected, it is natural to look at the

differences between the observed and

expected counts (Obs – Exp).

Page 24: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Calculations (cont.)

The test statistic, called the chi-square (or

chi-squared) statistic, is found by adding up

the sum of the squares of the deviations

between the observed and expected counts

divided by the expected counts:

2

2

all cells

Obs Exp

Exp

Page 25: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

One-Sided or Two-Sided?

The chi-square statistic is used only for testing hypotheses, not for constructing confidence intervals.

If the observed counts don’t match the expected, the statistic will be large—it can’t be “too small.”

So the chi-square test is always one-sided.

If the calculated statistic value is large enough, we’ll reject the null hypothesis.

Page 26: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

One-Sided or Two-Sided?

The mechanics may work like a one-sided test, but the interpretation of a chi-square test is in some ways many-sided.

There are many ways the null hypothesis could be wrong.

There’s no direction to the rejection of the null model—all we know is that it doesn’t fit.

Page 27: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Procedure

Page 28: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Procedure (cont.)

Page 29: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Expected Frequencies

If all expected frequencies are not all equal:

each expected frequency is found by

multiplying the sum of all observed

frequencies by the probability for the

category

E = n p

Page 30: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Expected Frequencies

The chi-square goodness of fit test is always a right-tailed test.

For the chi-square goodness-of-fit test, the expected frequencies should be at least 5.

When the expected frequency of a class or category is less than 5, this class or category can be combined with another class or category so that the expected frequency is at least 5.

Page 31: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Goodness-of-fit Test

Test Statistic

Critical Values

1. Found in Table using k – 1 degrees of

freedom where k = number of categories

2. Goodness-of-fit hypothesis tests are

always right-tailed.

2= (O – E)2

E

Page 32: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

EXAMPLE

There are 4 TV sets that are located in the student center of a large university. At a particular time each day, four different soap operas (1, 2, 3, and 4) are viewed on these TV sets. The percentages of the audience captured by these shows during one semester were 25 percent, 30 percent, 25 percent, and 20 percent, respectively. During the first week of the following semester, 300 students are surveyed.

Page 33: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

EXAMPLE (Continued)

(a) If the viewing pattern has not changed, what number of students is expected to watch each soap opera?

Solution: Based on the information, the expected values will be: 0.25300 = 75, 0.30300 = 90, 0.25300 = 75, and 0.20300 = 60.

Page 34: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

EXAMPLE (Continued)

(b) Suppose that the actual observed numbers of students viewing the soap operas are given in the following table, test whether these numbers indicate a change at the 1 percent level of significance.

Page 35: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

EXAMPLE (Continued)

Solution: Given = 0.01, n = 4, df = 4 – 1 = 3, 2

0.01, 3= 11.345. The observed and expected frequencies are given below

Page 36: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

EXAMPLE (Continued)

Solution (continued): The 2 test statistic is computed below.

Page 37: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

EXAMPLE (Continued)

Solution (continued):

P-value = .6828, P > 𝛼

Page 38: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

EXAMPLE (Continued)

Solution (continued):

Diagram showing

the rejection

region.

Page 39: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

The Chi-Square test for Goodness of Fit

Page 40: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Your Turn

The Advanced Placement (AP) Statistics examination was first administered in May 1997. Students’ papers are graded on a scale of 1–5, with 5 being the highest score. Over 7,600 students took the exam in the first year, and the distribution of scores was as follows (not including exams that were scored late).

Score 5 4 3 2 1 .

Percent 15.3 22.0 24.8 19.8 18.1

A distance learning class that took AP Statistics via satellite television had the following distribution of grades:

Score 5 4 3 2 1 .

Frequency 7 13 7 6 2

Page 41: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Score Observed

Counts

Expected %

(pi)

Expected

Counts (npi)

5 7 15.3 5.355 .50533

4 13 22 7.7 3.6481

3 7 24.8 8.68 .32516

2 6 19.8 6.93 .12481

1 2 18.1 6.335 2.9664

Totals 35 100% 35 7.56976

2

O E

E

Page 42: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Carry out an appropriate test to determine if

the distribution of scores for students enrolled

in the distance learning program is

significantly different from the distribution of

scores for all students who took the inaugural

exam.

Page 43: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

We must be willing to treat this class of students as an SRS from the population of all

distance learning classes. We will proceed with caution. All expected counts are 5 or more.

Ho: The distribution of AP Statistics exams scores for distance learning students is the same as the distribution of scores for all students who took the May 1997 exam.

Ha:The distribution of AP Statistics exams scores for distance learning students is different than the distribution of scores for all students who took the May 1997 exam.

Page 44: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

We will use a significance level of 0.05. There are 5 categories, meaning there are 5 – 1 = 4 degrees of freedom.

P-value = .1087

We do not have enough evidence to reject Ho since

p > alpha. We do not have enough evidence to suggest the distributions of scores of traditional students is different than the distribution of scores of the distance learning students.

24 7.56976

24( 7.56976)P

Page 45: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

2 TEST OF INDEPENDENCE

Page 46: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Independence

Contingency tables categorize counts on two (or more) variables so that we can see whether the distribution of counts on one variable is contingent on the other.

A test of whether the two categorical variables are independent examines the distribution of counts for one group of individuals classified according to both variables in a contingency table.

Page 47: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Definition

Test of Independence

This method tests the null hypothesis that the row variable and column variable in a contingency table are not related. (The null hypothesis is the statement that the row and column variables are independent.)

Page 48: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Assumptions and Conditions

The assumptions and conditions are the same as for the chi-square goodness-of-fit test:

Counted Data Condition: The data must be counts.

Randomization Condition and 10% Condition:As long as we don’t want to generalize, we don’t have to check these conditions.

Expected Cell Frequency Condition: The expected count in each cell must be at least 5.

Page 49: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Test of Independence

Test Statistic

Critical Values

1. Found in Table using

degrees of freedom = (r – 1)(c – 1)

r is the number of rows and c is the number of

columns

2. Tests of Independence are always right-

tailed.

2= (O – E)2

E

Page 50: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Tests of Independence

H0: The row variable is independent of the

column variable

H1: The row variable is dependent (related to) the column variable

This procedure cannot be used to establish a direct cause-and-effect link between variables in question.

Dependence means only there is a relationshipbetween the two variables.

Page 51: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Expected Frequency for Contingency Tables

E = • •table total

row total column total

table totaltable total

E = (row total) (column total)

(table total)

(probability of a cell)

n • p

Page 52: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

(row total) (column total)

(table total)E =

Total number of all observed frequencies

in the table

Page 53: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Observed and Expected Frequencies

332

1360

1692

318

104

422

29

35

64

27

18

45

706

1517

2223

Men Women Boys Girls Total

Survived

Died

Total

We will use the mortality table from the Titanic to find expected

frequencies. For the upper left hand cell, we find:

= 537.360E =(706)(1692)

2223

Page 54: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

332

537.360

1360

1692

318

104

422

29

35

64

27

18

45

706

1517

2223

Men Women Boys Girls Total

Survived

Died

Total

Find the expected frequency for the lower left hand cell, assuming

independence between the row variable and the column variable.

= 1154.640E =(1517)(1692)

2223

Observed and Expected Frequencies

Page 55: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

332

537.360

1360

1154.64

1692

318

134.022

104

287.978

422

29

20.326

35

43.674

64

27

14.291

18

30.709

45

706

1517

2223

Men Women Boys Girls Total

Survived

Died

Total

To interpret this result for the lower left hand cell, we can say that although 1360

men actually died, we would have expected 1154.64 men to die if survivablility is

independent of whether the person is a man, woman, boy, or girl.

Observed and Expected Frequencies

Page 56: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Example: Using a 0.05 significance level, test the claim

that when the Titanic sank, whether someone survived or

died is independent of whether that person is a man,

woman, boy, or girl.

H0: Whether a person survived is independent of whether

the person is a man, woman, boy, or girl.

H1: Surviving the Titanic and being a man, woman, boy,

or girl are dependent.

Page 57: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Example: Using a 0.05 significance level, test the claim

that when the Titanic sank, whether someone survived or

died is independent of whether that person is a man,

woman, boy, or girl.

2= (332–537.36)2 + (318–132.022)2 + (29–20.326)2 + (27–14.291)2

537.36 134.022 20.326 14.291

+ (1360–1154.64)2 + (104–287.978)2 + (35–43.674)2 + (18–30.709)2

1154.64 287.978 43.674 30.709

2=78.481 + 252.555 + 3.702+11.302+36.525+117.536+1.723+5.260

= 507.084

Page 58: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Example: Using a 0.05 significance level, test the claim

that when the Titanic sank, whether someone survived or

died is independent of whether that person is a man,

woman, boy, or girl.

The number of degrees of freedom are (r–1)(c–1)=

(2–1)(4–1)=3.

Critical value: 2*.05,3 = 7.815. 507.084 > 7.815

We reject the null hypothesis.

P-value: P = P(2 > 507.084) = 0. P < 𝛼.We reject the null hypothesis.

Survival and gender are dependent.

Page 59: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Test Statistic 2= 507.084

with = 0.05 and (r – 1) (c– 1) = (2 – 1) (4 – 1) = 3 degrees of freedom

Critical Value 2= 7.815 (from Table )

Page 60: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Procedure

Page 61: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Procedure (cont.)

Page 62: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

EXAMPLE

A survey was done by a car manufacturer concerning a particular make and model. A group of 500 potential customers were asked whether they purchased their current car because of its appearance, its performance rating, or its fixed price (no negotiating). The results, broken down by gender responses, are given on the next slide.

Page 63: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

EXAMPLE (Continued)

Question: Do females feel differently than males about the three different criteria used in choosing a car, or do they feel basically the same?

Page 64: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution

χ2 Test for independence.

Thus the null hypothesis will be that the criterion used is independent of gender, while the alternative hypothesis will be that the criterion used is dependent on gender.

Page 65: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution (continued)

The degrees of freedom is given by (number of rows – 1)(number of columns –1).

df = (2 – 1)(3 – 1) = 2.

Page 66: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution (continued)

Calculate the row and column totals. These row and column are called marginal totals.

Page 67: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution (continued)

Computation of the expected values

The expected value for a cell is the row total times the column total divided by the table total.

Page 68: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution (continued)

Let us use = 0.01. So df = (2 –1)(3 –1) = 2 and 20.01,

2 = 9.210.

Page 69: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution (continued)

The 2 test statistic is computed in the same manner as was done for the goodness-of-fit test.

Page 70: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution (continued)

Page 71: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution (continued)

Diagram showing the rejection region.

Page 72: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Test of Homogeneity

Page 73: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Comparing Observed Distributions

A test comparing the distribution of counts for

two or more groups on the same categorical

variable is called a chi-square test of

homogeneity.

A test of homogeneity is actually the

generalization of the two-proportion z-test.

Page 74: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Comparing Observed Distributions (cont.)

The statistic that we calculate for this test is identical to the chi-square statistic for independence.

In this test, however, we ask whether choices are the same among different groups (i.e., there is no model).

The expected counts are found directly from the data and we have different degrees of freedom.

Page 75: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Assumptions and Conditions

The assumptions and conditions are the same as for the chi-square goodness-of-fit test:

Counted Data Condition: The data must be counts.

Randomization Condition and 10% Condition:As long as we don’t want to generalize, we don’t have to check these conditions.

Expected Cell Frequency Condition: The expected count in each cell must be at least 5.

Page 76: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Test for Homogeneity

In a chi-square test for homogeneity of

proportions, we test whether different

populations have the same proportion of

individuals with some characteristic.

The procedures for performing a test of

homogeneity are identical to those for a test

of independence.

Page 77: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Example:

The following question was asked of a random sample of individuals in 1992, 2002, and 2008: “Would you tell me if you feel being a teacher is an occupation of very great prestige?” The results of the survey are presented below:

Test the claim that the proportion of individuals that feel being a teacher is an occupation of very great prestige is the same for each year at the = 0.01 level of significance.

1992 2002 2008

Yes 418 479 525

No 602 541 485

Page 78: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution

Step 1: The null hypothesis is a statement of “no difference” so the proportions for each year who feel that being a teacher is an occupation of very great prestige are equal. We state the hypotheses as follows:

H0: p1992= p2002= p2008

H1: At least one of the proportions is different from the others.

Step 2: The level of significance is =0.01.

Page 79: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution

Step 3:

(a) The expected frequencies are found by

multiplying the appropriate row and column

totals and then dividing by the total sample

size. They are given in parentheses in the

table below, along with the observed

frequencies.

1992 2002 2008

Yes418

(475.554)

479

(475.554)

525

(470.892)

No602

(544.446)

541

(544.446)

485

(539.108)

Page 80: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution

Step 3:

(b) Since none of the expected frequencies are

less than 5, the requirements are satisfied.

(c) The test statistic is

02

418 475.554 2

475.554479 475.554

2

475.554

485 539.108

2

539.108

24.74

Page 81: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution: Classical Approach

Step 4: There are r = 2 rows and c =3

columns, so we find the critical

value using (2-1)(3-1) = 2 degrees

of freedom.

The critical value is .

0.012 9.210

Page 82: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution: Classical Approach

Step 5: Since the test statistic,

is greater than the critical value

, we reject the null hypothesis.

02 24.74

0.012 9.210

Page 83: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution: P-Value Approach

Step 4: There are r = 2 rows and c =3

columns so we find the P-value using

(2-1)(3-1) = 2 degrees of freedom.

The P-value is the area under the chi-

square distribution with 2 degrees of

freedom to the right of

which is approximately 0.

02 24.74

Page 84: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution: P-Value Approach

Step 5: Since the P-value is less than the

level of significance = 0.01, we

reject the null hypothesis.

Page 85: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Solution

Step 6: There is sufficient evidence to

reject the null hypothesis at the =

0.01 level of significance. We

conclude that the proportion of

individuals who believe that

teaching is a very prestigious career

is different for at least one of the

three years.

Page 86: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Example: Should Dentist Advertise?

It may seem hard to believe but until the

1970’s most professional organizations

prohibited their members from advertising. In

1977, the U.S. Supreme Court ruled that

prohibiting doctors and lawyers from

advertising violated their free speech rights.

Page 87: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Should Dentist Advertise?

The paper “Should Dentist Advertise?” (J. of

Advertising Research (June 1982): 33 – 38)

compared the attitudes of consumers and

dentists toward the advertising of dental

services. Separate samples of 101

consumers and 124 dentists were asked to

respond to the following statement: “I favor

the use of advertising by dentists to attract

new patients.”

Page 88: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Example: Should Dentist Advertise?

Possible responses were: strongly agree,

agree, neutral, disagree, strongly disagree.

The authors were interested in determining

whether the two groups—dentists and

consumers—differed in their attitudes toward

advertising.

Page 89: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Example: Should Dentist Advertise?

This is a done by a chi-squared test of

homogeneity, that is we are testing the claim

that different populations have the same ratio

across some second variable characteristic.

So how should we state the null and

alternative hypotheses for this test?

Page 90: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Example: Should Dentist Advertise?

H0:

Ha:

The true category proportions for all

responses are the same for both populations

of consumers and dentists.

The true category proportions for all

responses are not the same for both

populations of consumers and dentists.

Page 91: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Observed Data

Agree Neutral Disagree

Consumers 34 49 9 4 5

Dentists 9 18 23 28 46

43 67 32 32 51

Strongly

Agree

Strongly

Disagree

Response

Group

• How do we determine the expected cell count under the assumption of homogeneity?

• That’s right, the expected cell counts are estimated from the sample data (assuming

that H0 is true) by using …

expected row marginal total column marginal total

cell count the total sample size

101

124

225

Page 92: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Expected Values

Agree Neutral Disagree

Consumers 34 49 9 4 5

Dentists 9 18 23 28 46

43 67 32 32 51

Strongly

Agree

Strongly

Disagree

Response

Group

• So the calculation for the first cell is …

st 101 431 expected19.302

225cell count

19.30101

124

225

Page 93: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Expected Values

Agree Neutral Disagree

Consumers 34 49 9 4 5

Dentists 9 18 23 28 46

43 67 32 32 51

Strongly

Agree

Strongly

Disagree

Response

Group

19.30

23.70

30.08

36.92 17.64

14.36

28.11

101

124

225

14.36

17.64

22.89

Page 94: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Test Statistic

Now we can calculate the 2 test statistic:

2

2Observed Count Expected Count

Expected Count

2 2 2

34 19.30 49 30.08 46 28.11...

19.30 30.08 28.11

11.20 11.90 2.00 ... 11.39 84.47

Page 95: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Sampling Distribution

The two-way table for this situation has 2

rows and 5 columns, so the appropriate

degrees of freedom is (2 – 1)(5 – 1) = 4.

Chi-Squared critical value: 𝜒2*= 9.49. 𝜒2 (84.47) > 𝜒2* (9.49), Reject the null hypothesis.

Page 96: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

P-value

P-value: P = P(𝜒2 > 84.47) ≈ 0. Reject the null

hypothesis.

Conclusion: With a P-value ≈ 0, reject the

null hypothesis. The true category proportions

for all responses are not the same for both

populations of consumers and dentists.

Page 97: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Homogeneity of Proportions

An advertising firm has decided to ask 92

customers at each of three local shopping

malls if they are willing to take part in a

market research survey. According to

previous studies, 38% of Americans refuse to

take part in such surveys. At α = 0.01, test the

claim that the proportions are equal.

Page 98: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Homogeneity of Proportions

Step 1

Ho: p1 = p2 = p3

Ha: At least one

is different

Step 2

α = 0.01

Step 32

)2(

Mall

A

Mall

B

Mall

C

Total

Will

Partici

pate

52 45 36 133

Will

not

partici

pate

40 47 56 143

Total 92 92 92 276

Page 99: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Homogeneity of Proportions

Step 4

Put into your calculator

Observed in matrix A

Expected in matrix B

Test statistic = 5.602

P-value = 0.06

Page 100: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Homogeneity of Proportions

Step 5

Do Not Reject Ho

Step 6

There is not sufficient evidence to suggest that

at least one is different.

Page 101: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Chi-Square and Causation

Chi-square tests are common, and tests for independence are especially widespread.

We need to remember that a small P-value is notproof of causation.

Since the chi-square test for independence treats the two variables symmetrically, we cannot differentiate the direction of any possible causation even if it existed.

And, there’s never any way to eliminate the possibility that a lurking variable is responsible for the lack of independence.

Page 102: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

Chi-Square and Causation (cont.)

In some ways, a failure of independence

between two categorical variables is less

impressive than a strong, consistent, linear

association between quantitative variables.

Two categorical variables can fail the test of

independence in many ways.

Examining the standardized residuals can help

you think about the underlying patterns.

Page 103: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

CHI-SQUARE INFERENCE

TEST FOR GOODNESS OF FIT

• Used to determine if a particular population distribution fits a specified form

HYPOTHESES:

H0: Actual population percents are equal to

hypothesized percentages

Ha: Actual population percents are different from

hypothesized percentages

Page 104: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

CHI-SQUARE INFERENCE

TEST FOR INDEPENDENCE

• Used to determine if two variables within a single population are independent

HYPOTHESES:

H0: There is no relationship between the two variables

in the population

Ha: There is a dependent relationship between the two

variables in the population

Page 105: IB Math HL - Charlotte County Public Schools · Categorical Data Chi Square tests are used for when we have counts for the categories of a categorical variable: Goodness of Fit Test

CHI-SQUARE INFERENCE

TEST FOR HOMOGENEITY

• Used to determine if two separate populations are similar in respect to a single variable

HYPOTHESES:

H0: There are no differences among proportions of

success in the populations

Ha: There are differences among proportions of

success in the populations