ib math hl - charlotte county public schools · categorical data chi square tests are used for when...

Chapter 25

Comparing Counts

Objectives

Chi-Square Model

Chi-Square Statistic

Knowing when and how to use the Chi-

Square Tests;

Goodness of Fit

Test of Independence

Test of Homogeneity

Standardized Residual

Categorical Data

Chi Square tests are used for when we have countsfor the categories of a categorical variable:

Goodness of Fit Test

Allows us to test whether a certain population distribution seems valid. This is a one variable, one sample test


Cross categorizing one group on two-variables to see if there is an association between variables. This is a two variable, one sample test.

Test for Homogeneity

Compares observed distribution for several groups to each other to see if there is a difference among the population. This is a one variable, many samples test.

Chi Square Model

Just like the student t-models, chi square has a family

of models depending on degrees of freedom.

Unlike the student t-models, a chi square distribution

is not symmetric. It’s skewed right.

A chi square test statistic is always a one-sided, right-

tailed test.

The Chi-Square ( 2 ) Distribution - Properties

It is a continuous distribution.

It is not symmetric.

It is skewed to the right.

The distribution depends on the degrees of freedom.

The value of a 2 random variable is always

nonnegative.

There are infinitely many 2 distributions,

since each is uniquely defined by its degrees

of freedom.


For small sample size, the 2 distribution is

very skewed to the right.

As n increases, the 2 distribution becomes

more and more symmetrical.


Since we will be using the 2 distribution for

the tests in this chapter, we will need to be

able to find critical values associated with the

distribution.

Critical Value

• Since we will be using the 2 distribution for the tests

in this chapter, we will need to be able to find critical

values associated with the distribution.

• Explanation of the term – critical or rejection region: A

critical or rejection region is a range of test statistic

values for which the null hypothesis will be rejected.

• This range of values will indicate that there is a

significant or large enough difference between the

postulated parameter value and the corresponding

point estimate for the parameter.

Critical Value

• Explanation of the term – non-critical or non-rejection region: A non-critical or non-rejection region is a range of test statistic values for which the null hypothesis will not be rejected.

• This range of values will indicate that there is not a significant or large enough difference between the postulated parameter value and the corresponding point estimate for the parameter.

Critical Value

Non-Critical Region(Non-Rejection Region)

(Rejection Region)


Notation: 2, df

Explanation of the notation 2, df: 2

, df is a

2 value with n degrees of freedom such that

(the significance level) area is to the right of

the corresponding 2 value.


Diagram explaining thenotation2

, df

The Chi-Square ( 2 ) Distribution - Table


Values for the random variable with the

appropriate degrees of freedom can be

obtained from the tables in the formula

booklet.

Example: What is the value of 20.05,10?


α=.05

df=10

χ2 critical value


Solution: From Table in the formula booklet, 2

0.05,10 = 18.307.


Your Turn: What is the value of 20.10,20?


20.10,20 = 28.41

CHI-SQUARE (2) TEST

FOR GOODNESS OF FIT

Goodness-of-Fit

A test of whether the distribution of counts in

one categorical variable matches the

distribution predicted by a model is called a

goodness-of-fit test.

As usual, there are assumptions and

conditions to consider…

Assumptions and Conditions

Counted Data Condition: Check that the data

are counts for the categories of a categorical

variable.

Independence Assumption: The counts in

the cells should be independent of each

other.

Randomization Condition: The individuals who

have been counted and whose counts are

available for analysis should be a random

sample from some population.


Sample Size Assumption: We must have

enough data for the methods to work.

Expected Cell Frequency Condition: We

should expect to see at least 5 individuals in

each cell.

This is similar to the condition that np

and nq be at least 10 when we tested

proportions.

Calculations

Since we want to examine how well the

observed data reflect what would be

expected, it is natural to look at the

differences between the observed and

expected counts (Obs – Exp).

Calculations (cont.)

The test statistic, called the chi-square (or

chi-squared) statistic, is found by adding up

the sum of the squares of the deviations

between the observed and expected counts

divided by the expected counts:

2

2

all cells

Obs Exp

Exp

One-Sided or Two-Sided?

The chi-square statistic is used only for testing hypotheses, not for constructing confidence intervals.

If the observed counts don’t match the expected, the statistic will be large—it can’t be “too small.”

So the chi-square test is always one-sided.

If the calculated statistic value is large enough, we’ll reject the null hypothesis.

One-Sided or Two-Sided?

The mechanics may work like a one-sided test, but the interpretation of a chi-square test is in some ways many-sided.

There are many ways the null hypothesis could be wrong.

There’s no direction to the rejection of the null model—all we know is that it doesn’t fit.

Procedure

Procedure (cont.)

Expected Frequencies

If all expected frequencies are not all equal:

each expected frequency is found by

multiplying the sum of all observed

frequencies by the probability for the

category

E = n p

Expected Frequencies

The chi-square goodness of fit test is always a right-tailed test.

For the chi-square goodness-of-fit test, the expected frequencies should be at least 5.

When the expected frequency of a class or category is less than 5, this class or category can be combined with another class or category so that the expected frequency is at least 5.

Goodness-of-fit Test

Test Statistic

Critical Values

1. Found in Table using k – 1 degrees of

freedom where k = number of categories

2. Goodness-of-fit hypothesis tests are

always right-tailed.

2= (O – E)2

E

EXAMPLE

There are 4 TV sets that are located in the student center of a large university. At a particular time each day, four different soap operas (1, 2, 3, and 4) are viewed on these TV sets. The percentages of the audience captured by these shows during one semester were 25 percent, 30 percent, 25 percent, and 20 percent, respectively. During the first week of the following semester, 300 students are surveyed.

EXAMPLE (Continued)

(a) If the viewing pattern has not changed, what number of students is expected to watch each soap opera?

Solution: Based on the information, the expected values will be: 0.25300 = 75, 0.30300 = 90, 0.25300 = 75, and 0.20300 = 60.

EXAMPLE (Continued)

(b) Suppose that the actual observed numbers of students viewing the soap operas are given in the following table, test whether these numbers indicate a change at the 1 percent level of significance.

EXAMPLE (Continued)

Solution: Given = 0.01, n = 4, df = 4 – 1 = 3, 2

0.01, 3= 11.345. The observed and expected frequencies are given below

EXAMPLE (Continued)

Solution (continued): The 2 test statistic is computed below.

EXAMPLE (Continued)

Solution (continued):

P-value = .6828, P > 𝛼

EXAMPLE (Continued)

Solution (continued):

Diagram showing

the rejection

region.

The Chi-Square test for Goodness of Fit

Your Turn

The Advanced Placement (AP) Statistics examination was first administered in May 1997. Students’ papers are graded on a scale of 1–5, with 5 being the highest score. Over 7,600 students took the exam in the first year, and the distribution of scores was as follows (not including exams that were scored late).

Score 5 4 3 2 1 .

Percent 15.3 22.0 24.8 19.8 18.1

A distance learning class that took AP Statistics via satellite television had the following distribution of grades:

Score 5 4 3 2 1 .

Frequency 7 13 7 6 2

Score Observed

Counts

Expected %

(pi)

Expected

Counts (npi)

5 7 15.3 5.355 .50533

4 13 22 7.7 3.6481

3 7 24.8 8.68 .32516

2 6 19.8 6.93 .12481

1 2 18.1 6.335 2.9664

Totals 35 100% 35 7.56976

2

O E

E

Carry out an appropriate test to determine if

the distribution of scores for students enrolled

in the distance learning program is

significantly different from the distribution of

scores for all students who took the inaugural

exam.

We must be willing to treat this class of students as an SRS from the population of all

distance learning classes. We will proceed with caution. All expected counts are 5 or more.

Ho: The distribution of AP Statistics exams scores for distance learning students is the same as the distribution of scores for all students who took the May 1997 exam.

Ha:The distribution of AP Statistics exams scores for distance learning students is different than the distribution of scores for all students who took the May 1997 exam.

We will use a significance level of 0.05. There are 5 categories, meaning there are 5 – 1 = 4 degrees of freedom.

P-value = .1087

We do not have enough evidence to reject Ho since

p > alpha. We do not have enough evidence to suggest the distributions of scores of traditional students is different than the distribution of scores of the distance learning students.

24 7.56976

24( 7.56976)P

2 TEST OF INDEPENDENCE

Independence

Contingency tables categorize counts on two (or more) variables so that we can see whether the distribution of counts on one variable is contingent on the other.

A test of whether the two categorical variables are independent examines the distribution of counts for one group of individuals classified according to both variables in a contingency table.

Definition


This method tests the null hypothesis that the row variable and column variable in a contingency table are not related. (The null hypothesis is the statement that the row and column variables are independent.)


The assumptions and conditions are the same as for the chi-square goodness-of-fit test:

Counted Data Condition: The data must be counts.

Randomization Condition and 10% Condition:As long as we don’t want to generalize, we don’t have to check these conditions.

Expected Cell Frequency Condition: The expected count in each cell must be at least 5.


Test Statistic

Critical Values

1. Found in Table using

degrees of freedom = (r – 1)(c – 1)

r is the number of rows and c is the number of

columns

2. Tests of Independence are always right-

tailed.

2= (O – E)2

E

Tests of Independence

H0: The row variable is independent of the

column variable

H1: The row variable is dependent (related to) the column variable

This procedure cannot be used to establish a direct cause-and-effect link between variables in question.

Dependence means only there is a relationshipbetween the two variables.

Expected Frequency for Contingency Tables

E = • •table total

row total column total

table totaltable total

E = (row total) (column total)

(table total)

(probability of a cell)

n • p

(row total) (column total)

(table total)E =

Total number of all observed frequencies

in the table

Observed and Expected Frequencies

332

1360

1692

318

104

422

29

35

64

27

18

45

706

1517

2223

Men Women Boys Girls Total

Survived

Died

Total

We will use the mortality table from the Titanic to find expected

frequencies. For the upper left hand cell, we find:

= 537.360E =(706)(1692)

2223

332

537.360

1360

1692

318

104

422

29

35

64

27

18

45

706

1517

2223


Survived

Died

Total

Find the expected frequency for the lower left hand cell, assuming

independence between the row variable and the column variable.

= 1154.640E =(1517)(1692)

2223


332

537.360

1360

1154.64

1692

318

134.022

104

287.978

422

29

20.326

35

43.674

64

27

14.291

18

30.709

45

706

1517

2223


Survived

Died

Total

To interpret this result for the lower left hand cell, we can say that although 1360

men actually died, we would have expected 1154.64 men to die if survivablility is

independent of whether the person is a man, woman, boy, or girl.


Example: Using a 0.05 significance level, test the claim

that when the Titanic sank, whether someone survived or

died is independent of whether that person is a man,

woman, boy, or girl.

H0: Whether a person survived is independent of whether

the person is a man, woman, boy, or girl.

H1: Surviving the Titanic and being a man, woman, boy,

or girl are dependent.





2= (332–537.36)2 + (318–132.022)2 + (29–20.326)2 + (27–14.291)2

537.36 134.022 20.326 14.291

+ (1360–1154.64)2 + (104–287.978)2 + (35–43.674)2 + (18–30.709)2

1154.64 287.978 43.674 30.709

2=78.481 + 252.555 + 3.702+11.302+36.525+117.536+1.723+5.260

= 507.084





The number of degrees of freedom are (r–1)(c–1)=

(2–1)(4–1)=3.

Critical value: 2*.05,3 = 7.815. 507.084 > 7.815

We reject the null hypothesis.

P-value: P = P(2 > 507.084) = 0. P < 𝛼.We reject the null hypothesis.

Survival and gender are dependent.

Test Statistic 2= 507.084

with = 0.05 and (r – 1) (c– 1) = (2 – 1) (4 – 1) = 3 degrees of freedom

Critical Value 2= 7.815 (from Table )

Procedure

Procedure (cont.)

EXAMPLE

A survey was done by a car manufacturer concerning a particular make and model. A group of 500 potential customers were asked whether they purchased their current car because of its appearance, its performance rating, or its fixed price (no negotiating). The results, broken down by gender responses, are given on the next slide.

EXAMPLE (Continued)

Question: Do females feel differently than males about the three different criteria used in choosing a car, or do they feel basically the same?

Solution

χ2 Test for independence.

Thus the null hypothesis will be that the criterion used is independent of gender, while the alternative hypothesis will be that the criterion used is dependent on gender.

Solution (continued)

The degrees of freedom is given by (number of rows – 1)(number of columns –1).

df = (2 – 1)(3 – 1) = 2.


Calculate the row and column totals. These row and column are called marginal totals.


Computation of the expected values

The expected value for a cell is the row total times the column total divided by the table total.


Let us use = 0.01. So df = (2 –1)(3 –1) = 2 and 20.01,

2 = 9.210.


The 2 test statistic is computed in the same manner as was done for the goodness-of-fit test.


Diagram showing the rejection region.

Test of Homogeneity

Comparing Observed Distributions

A test comparing the distribution of counts for

two or more groups on the same categorical

variable is called a chi-square test of

homogeneity.

A test of homogeneity is actually the

generalization of the two-proportion z-test.

Comparing Observed Distributions (cont.)

The statistic that we calculate for this test is identical to the chi-square statistic for independence.

In this test, however, we ask whether choices are the same among different groups (i.e., there is no model).

The expected counts are found directly from the data and we have different degrees of freedom.


The assumptions and conditions are the same as for the chi-square goodness-of-fit test:

Counted Data Condition: The data must be counts.

Randomization Condition and 10% Condition:As long as we don’t want to generalize, we don’t have to check these conditions.

Expected Cell Frequency Condition: The expected count in each cell must be at least 5.

Test for Homogeneity

In a chi-square test for homogeneity of

proportions, we test whether different

populations have the same proportion of

individuals with some characteristic.

The procedures for performing a test of

homogeneity are identical to those for a test

of independence.

Example:

The following question was asked of a random sample of individuals in 1992, 2002, and 2008: “Would you tell me if you feel being a teacher is an occupation of very great prestige?” The results of the survey are presented below:

Test the claim that the proportion of individuals that feel being a teacher is an occupation of very great prestige is the same for each year at the = 0.01 level of significance.

1992 2002 2008

Yes 418 479 525

No 602 541 485

Solution

Step 1: The null hypothesis is a statement of “no difference” so the proportions for each year who feel that being a teacher is an occupation of very great prestige are equal. We state the hypotheses as follows:

H0: p1992= p2002= p2008

H1: At least one of the proportions is different from the others.

Step 2: The level of significance is =0.01.

Solution

Step 3:

(a) The expected frequencies are found by

multiplying the appropriate row and column

totals and then dividing by the total sample

size. They are given in parentheses in the

table below, along with the observed

frequencies.

1992 2002 2008

Yes418

(475.554)

479

(475.554)

525

(470.892)

No602

(544.446)

541

(544.446)

485

(539.108)

Solution

Step 3:

(b) Since none of the expected frequencies are

less than 5, the requirements are satisfied.

(c) The test statistic is

02

418 475.554 2

475.554479 475.554

2

475.554

485 539.108

2

539.108

24.74

Solution: Classical Approach

Step 4: There are r = 2 rows and c =3

columns, so we find the critical

value using (2-1)(3-1) = 2 degrees

of freedom.

The critical value is .

0.012 9.210

Solution: Classical Approach

Step 5: Since the test statistic,

is greater than the critical value

, we reject the null hypothesis.

02 24.74

0.012 9.210

Solution: P-Value Approach

Step 4: There are r = 2 rows and c =3

columns so we find the P-value using

(2-1)(3-1) = 2 degrees of freedom.

The P-value is the area under the chi-

square distribution with 2 degrees of

freedom to the right of

which is approximately 0.

02 24.74

Solution: P-Value Approach

Step 5: Since the P-value is less than the

level of significance = 0.01, we

reject the null hypothesis.

Solution

Step 6: There is sufficient evidence to

reject the null hypothesis at the =

0.01 level of significance. We

conclude that the proportion of

individuals who believe that

teaching is a very prestigious career

is different for at least one of the

three years.

Example: Should Dentist Advertise?

It may seem hard to believe but until the

1970’s most professional organizations

prohibited their members from advertising. In

1977, the U.S. Supreme Court ruled that

prohibiting doctors and lawyers from

advertising violated their free speech rights.

Should Dentist Advertise?

The paper “Should Dentist Advertise?” (J. of

Advertising Research (June 1982): 33 – 38)

compared the attitudes of consumers and

dentists toward the advertising of dental

services. Separate samples of 101

consumers and 124 dentists were asked to

respond to the following statement: “I favor

the use of advertising by dentists to attract

new patients.”


Possible responses were: strongly agree,

agree, neutral, disagree, strongly disagree.

The authors were interested in determining

whether the two groups—dentists and

consumers—differed in their attitudes toward

advertising.


This is a done by a chi-squared test of

homogeneity, that is we are testing the claim

that different populations have the same ratio

across some second variable characteristic.

So how should we state the null and

alternative hypotheses for this test?


H0:

Ha:

The true category proportions for all

responses are the same for both populations

of consumers and dentists.

The true category proportions for all

responses are not the same for both

populations of consumers and dentists.

Observed Data

Agree Neutral Disagree

Consumers 34 49 9 4 5

Dentists 9 18 23 28 46

43 67 32 32 51

Strongly

Agree

Strongly

Disagree

Response

Group

• How do we determine the expected cell count under the assumption of homogeneity?

• That’s right, the expected cell counts are estimated from the sample data (assuming

that H0 is true) by using …

expected row marginal total column marginal total

cell count the total sample size

101

124

225

Expected Values



Dentists 9 18 23 28 46

43 67 32 32 51

Strongly

Agree

Strongly

Disagree

Response

Group

• So the calculation for the first cell is …

st 101 431 expected19.302

225cell count

19.30101

124

225

Expected Values



Dentists 9 18 23 28 46

43 67 32 32 51

Strongly

Agree

Strongly

Disagree

Response

Group

19.30

23.70

30.08

36.92 17.64

14.36

28.11

101

124

225

14.36

17.64

22.89

Test Statistic

Now we can calculate the 2 test statistic:

2

2Observed Count Expected Count

Expected Count

2 2 2

34 19.30 49 30.08 46 28.11...

19.30 30.08 28.11

11.20 11.90 2.00 ... 11.39 84.47

Sampling Distribution

The two-way table for this situation has 2

rows and 5 columns, so the appropriate

degrees of freedom is (2 – 1)(5 – 1) = 4.

Chi-Squared critical value: 𝜒2*= 9.49. 𝜒2 (84.47) > 𝜒2* (9.49), Reject the null hypothesis.

P-value

P-value: P = P(𝜒2 > 84.47) ≈ 0. Reject the null

hypothesis.

Conclusion: With a P-value ≈ 0, reject the

null hypothesis. The true category proportions

for all responses are not the same for both

populations of consumers and dentists.

Homogeneity of Proportions

An advertising firm has decided to ask 92

customers at each of three local shopping

malls if they are willing to take part in a

market research survey. According to

previous studies, 38% of Americans refuse to

take part in such surveys. At α = 0.01, test the

claim that the proportions are equal.


Step 1

Ho: p1 = p2 = p3

Ha: At least one

is different

Step 2

α = 0.01

Step 32

)2(

Mall

A

Mall

B

Mall

C

Total

Will

Partici

pate

52 45 36 133

Will

not

partici

pate

40 47 56 143

Total 92 92 92 276


Step 4

Put into your calculator

Observed in matrix A

Expected in matrix B

Test statistic = 5.602

P-value = 0.06


Step 5

Do Not Reject Ho

Step 6

There is not sufficient evidence to suggest that

at least one is different.

Chi-Square and Causation

Chi-square tests are common, and tests for independence are especially widespread.

We need to remember that a small P-value is notproof of causation.

Since the chi-square test for independence treats the two variables symmetrically, we cannot differentiate the direction of any possible causation even if it existed.

And, there’s never any way to eliminate the possibility that a lurking variable is responsible for the lack of independence.

Chi-Square and Causation (cont.)

In some ways, a failure of independence

between two categorical variables is less

impressive than a strong, consistent, linear

association between quantitative variables.

Two categorical variables can fail the test of

independence in many ways.

Examining the standardized residuals can help

you think about the underlying patterns.

CHI-SQUARE INFERENCE

TEST FOR GOODNESS OF FIT

• Used to determine if a particular population distribution fits a specified form

HYPOTHESES:

H0: Actual population percents are equal to

hypothesized percentages

Ha: Actual population percents are different from

hypothesized percentages


TEST FOR INDEPENDENCE

• Used to determine if two variables within a single population are independent

HYPOTHESES:

H0: There is no relationship between the two variables

in the population

Ha: There is a dependent relationship between the two

variables in the population


TEST FOR HOMOGENEITY

• Used to determine if two separate populations are similar in respect to a single variable

HYPOTHESES:

H0: There are no differences among proportions of

success in the populations

Ha: There are differences among proportions of

success in the populations

ib math hl - charlotte county public schools · categorical data chi square tests are used for when...

Documents