ib math hl - charlotte county public schools · categorical data chi square tests are used for when...
TRANSCRIPT
Chapter 25
Comparing Counts
Objectives
Chi-Square Model
Chi-Square Statistic
Knowing when and how to use the Chi-
Square Tests;
Goodness of Fit
Test of Independence
Test of Homogeneity
Standardized Residual
Categorical Data
Chi Square tests are used for when we have countsfor the categories of a categorical variable:
Goodness of Fit Test
Allows us to test whether a certain population distribution seems valid. This is a one variable, one sample test
Test of Independence
Cross categorizing one group on two-variables to see if there is an association between variables. This is a two variable, one sample test.
Test for Homogeneity
Compares observed distribution for several groups to each other to see if there is a difference among the population. This is a one variable, many samples test.
Chi Square Model
Just like the student t-models, chi square has a family
of models depending on degrees of freedom.
Unlike the student t-models, a chi square distribution
is not symmetric. It’s skewed right.
A chi square test statistic is always a one-sided, right-
tailed test.
The Chi-Square ( 2 ) Distribution - Properties
It is a continuous distribution.
It is not symmetric.
It is skewed to the right.
The distribution depends on the degrees of freedom.
The value of a 2 random variable is always
nonnegative.
There are infinitely many 2 distributions,
since each is uniquely defined by its degrees
of freedom.
The Chi-Square ( 2 ) Distribution - Properties
For small sample size, the 2 distribution is
very skewed to the right.
As n increases, the 2 distribution becomes
more and more symmetrical.
The Chi-Square ( 2 ) Distribution - Properties
Since we will be using the 2 distribution for
the tests in this chapter, we will need to be
able to find critical values associated with the
distribution.
Critical Value
• Since we will be using the 2 distribution for the tests
in this chapter, we will need to be able to find critical
values associated with the distribution.
• Explanation of the term – critical or rejection region: A
critical or rejection region is a range of test statistic
values for which the null hypothesis will be rejected.
• This range of values will indicate that there is a
significant or large enough difference between the
postulated parameter value and the corresponding
point estimate for the parameter.
Critical Value
• Explanation of the term – non-critical or non-rejection region: A non-critical or non-rejection region is a range of test statistic values for which the null hypothesis will not be rejected.
• This range of values will indicate that there is not a significant or large enough difference between the postulated parameter value and the corresponding point estimate for the parameter.
Critical Value
Non-Critical Region(Non-Rejection Region)
(Rejection Region)
The Chi-Square ( 2 ) Distribution - Properties
Notation: 2, df
Explanation of the notation 2, df: 2
, df is a
2 value with n degrees of freedom such that
(the significance level) area is to the right of
the corresponding 2 value.
The Chi-Square ( 2 ) Distribution - Properties
Diagram explaining thenotation2
, df
The Chi-Square ( 2 ) Distribution - Table
The Chi-Square ( 2 ) Distribution - Table
Values for the random variable with the
appropriate degrees of freedom can be
obtained from the tables in the formula
booklet.
Example: What is the value of 20.05,10?
The Chi-Square ( 2 ) Distribution - Table
α=.05
df=10
χ2 critical value
The Chi-Square ( 2 ) Distribution - Table
Solution: From Table in the formula booklet, 2
0.05,10 = 18.307.
The Chi-Square ( 2 ) Distribution - Table
Your Turn: What is the value of 20.10,20?
The Chi-Square ( 2 ) Distribution - Table
20.10,20 = 28.41
CHI-SQUARE (2) TEST
FOR GOODNESS OF FIT
Goodness-of-Fit
A test of whether the distribution of counts in
one categorical variable matches the
distribution predicted by a model is called a
goodness-of-fit test.
As usual, there are assumptions and
conditions to consider…
Assumptions and Conditions
Counted Data Condition: Check that the data
are counts for the categories of a categorical
variable.
Independence Assumption: The counts in
the cells should be independent of each
other.
Randomization Condition: The individuals who
have been counted and whose counts are
available for analysis should be a random
sample from some population.
Assumptions and Conditions
Sample Size Assumption: We must have
enough data for the methods to work.
Expected Cell Frequency Condition: We
should expect to see at least 5 individuals in
each cell.
This is similar to the condition that np
and nq be at least 10 when we tested
proportions.
Calculations
Since we want to examine how well the
observed data reflect what would be
expected, it is natural to look at the
differences between the observed and
expected counts (Obs – Exp).
Calculations (cont.)
The test statistic, called the chi-square (or
chi-squared) statistic, is found by adding up
the sum of the squares of the deviations
between the observed and expected counts
divided by the expected counts:
2
2
all cells
Obs Exp
Exp
One-Sided or Two-Sided?
The chi-square statistic is used only for testing hypotheses, not for constructing confidence intervals.
If the observed counts don’t match the expected, the statistic will be large—it can’t be “too small.”
So the chi-square test is always one-sided.
If the calculated statistic value is large enough, we’ll reject the null hypothesis.
One-Sided or Two-Sided?
The mechanics may work like a one-sided test, but the interpretation of a chi-square test is in some ways many-sided.
There are many ways the null hypothesis could be wrong.
There’s no direction to the rejection of the null model—all we know is that it doesn’t fit.
Procedure
Procedure (cont.)
Expected Frequencies
If all expected frequencies are not all equal:
each expected frequency is found by
multiplying the sum of all observed
frequencies by the probability for the
category
E = n p
Expected Frequencies
The chi-square goodness of fit test is always a right-tailed test.
For the chi-square goodness-of-fit test, the expected frequencies should be at least 5.
When the expected frequency of a class or category is less than 5, this class or category can be combined with another class or category so that the expected frequency is at least 5.
Goodness-of-fit Test
Test Statistic
Critical Values
1. Found in Table using k – 1 degrees of
freedom where k = number of categories
2. Goodness-of-fit hypothesis tests are
always right-tailed.
2= (O – E)2
E
EXAMPLE
There are 4 TV sets that are located in the student center of a large university. At a particular time each day, four different soap operas (1, 2, 3, and 4) are viewed on these TV sets. The percentages of the audience captured by these shows during one semester were 25 percent, 30 percent, 25 percent, and 20 percent, respectively. During the first week of the following semester, 300 students are surveyed.
EXAMPLE (Continued)
(a) If the viewing pattern has not changed, what number of students is expected to watch each soap opera?
Solution: Based on the information, the expected values will be: 0.25300 = 75, 0.30300 = 90, 0.25300 = 75, and 0.20300 = 60.
EXAMPLE (Continued)
(b) Suppose that the actual observed numbers of students viewing the soap operas are given in the following table, test whether these numbers indicate a change at the 1 percent level of significance.
EXAMPLE (Continued)
Solution: Given = 0.01, n = 4, df = 4 – 1 = 3, 2
0.01, 3= 11.345. The observed and expected frequencies are given below
EXAMPLE (Continued)
Solution (continued): The 2 test statistic is computed below.
EXAMPLE (Continued)
Solution (continued):
P-value = .6828, P > 𝛼
EXAMPLE (Continued)
Solution (continued):
Diagram showing
the rejection
region.
The Chi-Square test for Goodness of Fit
Your Turn
The Advanced Placement (AP) Statistics examination was first administered in May 1997. Students’ papers are graded on a scale of 1–5, with 5 being the highest score. Over 7,600 students took the exam in the first year, and the distribution of scores was as follows (not including exams that were scored late).
Score 5 4 3 2 1 .
Percent 15.3 22.0 24.8 19.8 18.1
A distance learning class that took AP Statistics via satellite television had the following distribution of grades:
Score 5 4 3 2 1 .
Frequency 7 13 7 6 2
Score Observed
Counts
Expected %
(pi)
Expected
Counts (npi)
5 7 15.3 5.355 .50533
4 13 22 7.7 3.6481
3 7 24.8 8.68 .32516
2 6 19.8 6.93 .12481
1 2 18.1 6.335 2.9664
Totals 35 100% 35 7.56976
2
O E
E
Carry out an appropriate test to determine if
the distribution of scores for students enrolled
in the distance learning program is
significantly different from the distribution of
scores for all students who took the inaugural
exam.
We must be willing to treat this class of students as an SRS from the population of all
distance learning classes. We will proceed with caution. All expected counts are 5 or more.
Ho: The distribution of AP Statistics exams scores for distance learning students is the same as the distribution of scores for all students who took the May 1997 exam.
Ha:The distribution of AP Statistics exams scores for distance learning students is different than the distribution of scores for all students who took the May 1997 exam.
We will use a significance level of 0.05. There are 5 categories, meaning there are 5 – 1 = 4 degrees of freedom.
P-value = .1087
We do not have enough evidence to reject Ho since
p > alpha. We do not have enough evidence to suggest the distributions of scores of traditional students is different than the distribution of scores of the distance learning students.
24 7.56976
24( 7.56976)P
2 TEST OF INDEPENDENCE
Independence
Contingency tables categorize counts on two (or more) variables so that we can see whether the distribution of counts on one variable is contingent on the other.
A test of whether the two categorical variables are independent examines the distribution of counts for one group of individuals classified according to both variables in a contingency table.
Definition
Test of Independence
This method tests the null hypothesis that the row variable and column variable in a contingency table are not related. (The null hypothesis is the statement that the row and column variables are independent.)
Assumptions and Conditions
The assumptions and conditions are the same as for the chi-square goodness-of-fit test:
Counted Data Condition: The data must be counts.
Randomization Condition and 10% Condition:As long as we don’t want to generalize, we don’t have to check these conditions.
Expected Cell Frequency Condition: The expected count in each cell must be at least 5.
Test of Independence
Test Statistic
Critical Values
1. Found in Table using
degrees of freedom = (r – 1)(c – 1)
r is the number of rows and c is the number of
columns
2. Tests of Independence are always right-
tailed.
2= (O – E)2
E
Tests of Independence
H0: The row variable is independent of the
column variable
H1: The row variable is dependent (related to) the column variable
This procedure cannot be used to establish a direct cause-and-effect link between variables in question.
Dependence means only there is a relationshipbetween the two variables.
Expected Frequency for Contingency Tables
E = • •table total
row total column total
table totaltable total
E = (row total) (column total)
(table total)
(probability of a cell)
n • p
(row total) (column total)
(table total)E =
Total number of all observed frequencies
in the table
Observed and Expected Frequencies
332
1360
1692
318
104
422
29
35
64
27
18
45
706
1517
2223
Men Women Boys Girls Total
Survived
Died
Total
We will use the mortality table from the Titanic to find expected
frequencies. For the upper left hand cell, we find:
= 537.360E =(706)(1692)
2223
332
537.360
1360
1692
318
104
422
29
35
64
27
18
45
706
1517
2223
Men Women Boys Girls Total
Survived
Died
Total
Find the expected frequency for the lower left hand cell, assuming
independence between the row variable and the column variable.
= 1154.640E =(1517)(1692)
2223
Observed and Expected Frequencies
332
537.360
1360
1154.64
1692
318
134.022
104
287.978
422
29
20.326
35
43.674
64
27
14.291
18
30.709
45
706
1517
2223
Men Women Boys Girls Total
Survived
Died
Total
To interpret this result for the lower left hand cell, we can say that although 1360
men actually died, we would have expected 1154.64 men to die if survivablility is
independent of whether the person is a man, woman, boy, or girl.
Observed and Expected Frequencies
Example: Using a 0.05 significance level, test the claim
that when the Titanic sank, whether someone survived or
died is independent of whether that person is a man,
woman, boy, or girl.
H0: Whether a person survived is independent of whether
the person is a man, woman, boy, or girl.
H1: Surviving the Titanic and being a man, woman, boy,
or girl are dependent.
Example: Using a 0.05 significance level, test the claim
that when the Titanic sank, whether someone survived or
died is independent of whether that person is a man,
woman, boy, or girl.
2= (332–537.36)2 + (318–132.022)2 + (29–20.326)2 + (27–14.291)2
537.36 134.022 20.326 14.291
+ (1360–1154.64)2 + (104–287.978)2 + (35–43.674)2 + (18–30.709)2
1154.64 287.978 43.674 30.709
2=78.481 + 252.555 + 3.702+11.302+36.525+117.536+1.723+5.260
= 507.084
Example: Using a 0.05 significance level, test the claim
that when the Titanic sank, whether someone survived or
died is independent of whether that person is a man,
woman, boy, or girl.
The number of degrees of freedom are (r–1)(c–1)=
(2–1)(4–1)=3.
Critical value: 2*.05,3 = 7.815. 507.084 > 7.815
We reject the null hypothesis.
P-value: P = P(2 > 507.084) = 0. P < 𝛼.We reject the null hypothesis.
Survival and gender are dependent.
Test Statistic 2= 507.084
with = 0.05 and (r – 1) (c– 1) = (2 – 1) (4 – 1) = 3 degrees of freedom
Critical Value 2= 7.815 (from Table )
Procedure
Procedure (cont.)
EXAMPLE
A survey was done by a car manufacturer concerning a particular make and model. A group of 500 potential customers were asked whether they purchased their current car because of its appearance, its performance rating, or its fixed price (no negotiating). The results, broken down by gender responses, are given on the next slide.
EXAMPLE (Continued)
Question: Do females feel differently than males about the three different criteria used in choosing a car, or do they feel basically the same?
Solution
χ2 Test for independence.
Thus the null hypothesis will be that the criterion used is independent of gender, while the alternative hypothesis will be that the criterion used is dependent on gender.
Solution (continued)
The degrees of freedom is given by (number of rows – 1)(number of columns –1).
df = (2 – 1)(3 – 1) = 2.
Solution (continued)
Calculate the row and column totals. These row and column are called marginal totals.
Solution (continued)
Computation of the expected values
The expected value for a cell is the row total times the column total divided by the table total.
Solution (continued)
Let us use = 0.01. So df = (2 –1)(3 –1) = 2 and 20.01,
2 = 9.210.
Solution (continued)
The 2 test statistic is computed in the same manner as was done for the goodness-of-fit test.
Solution (continued)
Solution (continued)
Diagram showing the rejection region.
Test of Homogeneity
Comparing Observed Distributions
A test comparing the distribution of counts for
two or more groups on the same categorical
variable is called a chi-square test of
homogeneity.
A test of homogeneity is actually the
generalization of the two-proportion z-test.
Comparing Observed Distributions (cont.)
The statistic that we calculate for this test is identical to the chi-square statistic for independence.
In this test, however, we ask whether choices are the same among different groups (i.e., there is no model).
The expected counts are found directly from the data and we have different degrees of freedom.
Assumptions and Conditions
The assumptions and conditions are the same as for the chi-square goodness-of-fit test:
Counted Data Condition: The data must be counts.
Randomization Condition and 10% Condition:As long as we don’t want to generalize, we don’t have to check these conditions.
Expected Cell Frequency Condition: The expected count in each cell must be at least 5.
Test for Homogeneity
In a chi-square test for homogeneity of
proportions, we test whether different
populations have the same proportion of
individuals with some characteristic.
The procedures for performing a test of
homogeneity are identical to those for a test
of independence.
Example:
The following question was asked of a random sample of individuals in 1992, 2002, and 2008: “Would you tell me if you feel being a teacher is an occupation of very great prestige?” The results of the survey are presented below:
Test the claim that the proportion of individuals that feel being a teacher is an occupation of very great prestige is the same for each year at the = 0.01 level of significance.
1992 2002 2008
Yes 418 479 525
No 602 541 485
Solution
Step 1: The null hypothesis is a statement of “no difference” so the proportions for each year who feel that being a teacher is an occupation of very great prestige are equal. We state the hypotheses as follows:
H0: p1992= p2002= p2008
H1: At least one of the proportions is different from the others.
Step 2: The level of significance is =0.01.
Solution
Step 3:
(a) The expected frequencies are found by
multiplying the appropriate row and column
totals and then dividing by the total sample
size. They are given in parentheses in the
table below, along with the observed
frequencies.
1992 2002 2008
Yes418
(475.554)
479
(475.554)
525
(470.892)
No602
(544.446)
541
(544.446)
485
(539.108)
Solution
Step 3:
(b) Since none of the expected frequencies are
less than 5, the requirements are satisfied.
(c) The test statistic is
02
418 475.554 2
475.554479 475.554
2
475.554
485 539.108
2
539.108
24.74
Solution: Classical Approach
Step 4: There are r = 2 rows and c =3
columns, so we find the critical
value using (2-1)(3-1) = 2 degrees
of freedom.
The critical value is .
0.012 9.210
Solution: Classical Approach
Step 5: Since the test statistic,
is greater than the critical value
, we reject the null hypothesis.
02 24.74
0.012 9.210
Solution: P-Value Approach
Step 4: There are r = 2 rows and c =3
columns so we find the P-value using
(2-1)(3-1) = 2 degrees of freedom.
The P-value is the area under the chi-
square distribution with 2 degrees of
freedom to the right of
which is approximately 0.
02 24.74
Solution: P-Value Approach
Step 5: Since the P-value is less than the
level of significance = 0.01, we
reject the null hypothesis.
Solution
Step 6: There is sufficient evidence to
reject the null hypothesis at the =
0.01 level of significance. We
conclude that the proportion of
individuals who believe that
teaching is a very prestigious career
is different for at least one of the
three years.
Example: Should Dentist Advertise?
It may seem hard to believe but until the
1970’s most professional organizations
prohibited their members from advertising. In
1977, the U.S. Supreme Court ruled that
prohibiting doctors and lawyers from
advertising violated their free speech rights.
Should Dentist Advertise?
The paper “Should Dentist Advertise?” (J. of
Advertising Research (June 1982): 33 – 38)
compared the attitudes of consumers and
dentists toward the advertising of dental
services. Separate samples of 101
consumers and 124 dentists were asked to
respond to the following statement: “I favor
the use of advertising by dentists to attract
new patients.”
Example: Should Dentist Advertise?
Possible responses were: strongly agree,
agree, neutral, disagree, strongly disagree.
The authors were interested in determining
whether the two groups—dentists and
consumers—differed in their attitudes toward
advertising.
Example: Should Dentist Advertise?
This is a done by a chi-squared test of
homogeneity, that is we are testing the claim
that different populations have the same ratio
across some second variable characteristic.
So how should we state the null and
alternative hypotheses for this test?
Example: Should Dentist Advertise?
H0:
Ha:
The true category proportions for all
responses are the same for both populations
of consumers and dentists.
The true category proportions for all
responses are not the same for both
populations of consumers and dentists.
Observed Data
Agree Neutral Disagree
Consumers 34 49 9 4 5
Dentists 9 18 23 28 46
43 67 32 32 51
Strongly
Agree
Strongly
Disagree
Response
Group
• How do we determine the expected cell count under the assumption of homogeneity?
• That’s right, the expected cell counts are estimated from the sample data (assuming
that H0 is true) by using …
expected row marginal total column marginal total
cell count the total sample size
101
124
225
Expected Values
Agree Neutral Disagree
Consumers 34 49 9 4 5
Dentists 9 18 23 28 46
43 67 32 32 51
Strongly
Agree
Strongly
Disagree
Response
Group
• So the calculation for the first cell is …
st 101 431 expected19.302
225cell count
19.30101
124
225
Expected Values
Agree Neutral Disagree
Consumers 34 49 9 4 5
Dentists 9 18 23 28 46
43 67 32 32 51
Strongly
Agree
Strongly
Disagree
Response
Group
19.30
23.70
30.08
36.92 17.64
14.36
28.11
101
124
225
14.36
17.64
22.89
Test Statistic
Now we can calculate the 2 test statistic:
2
2Observed Count Expected Count
Expected Count
2 2 2
34 19.30 49 30.08 46 28.11...
19.30 30.08 28.11
11.20 11.90 2.00 ... 11.39 84.47
Sampling Distribution
The two-way table for this situation has 2
rows and 5 columns, so the appropriate
degrees of freedom is (2 – 1)(5 – 1) = 4.
Chi-Squared critical value: 𝜒2*= 9.49. 𝜒2 (84.47) > 𝜒2* (9.49), Reject the null hypothesis.
P-value
P-value: P = P(𝜒2 > 84.47) ≈ 0. Reject the null
hypothesis.
Conclusion: With a P-value ≈ 0, reject the
null hypothesis. The true category proportions
for all responses are not the same for both
populations of consumers and dentists.
Homogeneity of Proportions
An advertising firm has decided to ask 92
customers at each of three local shopping
malls if they are willing to take part in a
market research survey. According to
previous studies, 38% of Americans refuse to
take part in such surveys. At α = 0.01, test the
claim that the proportions are equal.
Homogeneity of Proportions
Step 1
Ho: p1 = p2 = p3
Ha: At least one
is different
Step 2
α = 0.01
Step 32
)2(
Mall
A
Mall
B
Mall
C
Total
Will
Partici
pate
52 45 36 133
Will
not
partici
pate
40 47 56 143
Total 92 92 92 276
Homogeneity of Proportions
Step 4
Put into your calculator
Observed in matrix A
Expected in matrix B
Test statistic = 5.602
P-value = 0.06
Homogeneity of Proportions
Step 5
Do Not Reject Ho
Step 6
There is not sufficient evidence to suggest that
at least one is different.
Chi-Square and Causation
Chi-square tests are common, and tests for independence are especially widespread.
We need to remember that a small P-value is notproof of causation.
Since the chi-square test for independence treats the two variables symmetrically, we cannot differentiate the direction of any possible causation even if it existed.
And, there’s never any way to eliminate the possibility that a lurking variable is responsible for the lack of independence.
Chi-Square and Causation (cont.)
In some ways, a failure of independence
between two categorical variables is less
impressive than a strong, consistent, linear
association between quantitative variables.
Two categorical variables can fail the test of
independence in many ways.
Examining the standardized residuals can help
you think about the underlying patterns.
CHI-SQUARE INFERENCE
TEST FOR GOODNESS OF FIT
• Used to determine if a particular population distribution fits a specified form
HYPOTHESES:
H0: Actual population percents are equal to
hypothesized percentages
Ha: Actual population percents are different from
hypothesized percentages
CHI-SQUARE INFERENCE
TEST FOR INDEPENDENCE
• Used to determine if two variables within a single population are independent
HYPOTHESES:
H0: There is no relationship between the two variables
in the population
Ha: There is a dependent relationship between the two
variables in the population
CHI-SQUARE INFERENCE
TEST FOR HOMOGENEITY
• Used to determine if two separate populations are similar in respect to a single variable
HYPOTHESES:
H0: There are no differences among proportions of
success in the populations
Ha: There are differences among proportions of
success in the populations