11.1 name: chi-square tests for goodness of fit

5
Statistics 11.1 Name: _______________ Chi-Square Tests for Goodness of Fit 11.1 Objectives: x State appropriate hypotheses and compute expected counts for a chi-square test for goodness of fit. x Calculate the chi-square statistic, degrees of freedom, and P-value for a chi-square test for goodness of fit. x Perform a chi-square test for goodness of fit. x Conduct a follow-up analysis when the results of a chi-square test are statistically significant. Example Problem: Computing Expected Counts and Calculating the Chi-Square Statistic On average, M&M’S Milk Chocolate Candies will contain 13 percent of each of browns and reds, 14 percent yellows, 16 percent greens, 20 percent oranges and 24 percent blues. Jerome’s class collected data from a random sample of 60 M&M’S Milk Chocolate Candies. 1. State appropriate hypotheses for testing the company’s claim about the color distribution of M&M’S Peanut Chocolate Candies. 2. Calculate the expected counts for each color. Show your work. 3. Calculate the chi-square statistic. key Ho The company's statedcolor distribution for all milk chocolate M 3M's is correct Ha The company's statedcolor distribution for all milk chocolate M M's is not correct observedExpected Blue a.at 4.4orange8 ao6o 2green 6o 6yellow 8.4 Redco.is 8Broun6 7.8x 9 a 44 4Itl8 II C2ga9b5 l5zI IjItC6 Ie7 s X 10.18

Upload: others

Post on 25-Nov-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 11.1 Name: Chi-Square Tests for Goodness of Fit

Statistics – 11.1 Name: _______________ Chi-Square Tests for Goodness of Fit

11.1 Objectives: x State appropriate hypotheses and compute expected counts for a chi-square test for goodness

of fit. x Calculate the chi-square statistic, degrees of freedom, and P-value for a chi-square test for

goodness of fit. x Perform a chi-square test for goodness of fit. x Conduct a follow-up analysis when the results of a chi-square test are statistically significant. Example Problem: Computing Expected Counts and Calculating the Chi-Square Statistic On average, M&M’S Milk Chocolate Candies will contain 13 percent of each of browns and reds, 14 percent yellows, 16 percent greens, 20 percent oranges and 24 percent blues. Jerome’s class collected data from a random sample of 60 M&M’S Milk Chocolate Candies.

1. State appropriate hypotheses for testing the company’s claim about the color distribution of M&M’S Peanut Chocolate Candies.

2. Calculate the expected counts for each color. Show your work.

3. Calculate the chi-square statistic.

key

HoThecompany'sstatedcolordistribution for all milk chocolate M3M'sis correct

HaThecompany'sstatedcolordistribution for all milk chocolate M M'sisnotcorrect

observedExpectedBlue a.at4.4orange8ao6o2green6o6yellow 8.4Redco.is8Broun6

7.8x9a444Itl8 II C2ga9b5 l5zI IjItC6Ie7s

X 10.18

Page 2: 11.1 Name: Chi-Square Tests for Goodness of Fit

Statistics – 11.1 Name: _______________ Chi-Square Tests for Goodness of Fit

Chi-Square Distributions The chi-square distributions are a family of distributions that take only positive values and are skewed to the right. A particular chi-square distribution is specified by giving its degrees of freedom. The chi-square goodness-of-fit test uses the chi-square distribution with degrees of freedom = the number of categories - 1.

x When the expected counts are all at least 5, the sampling distribution of the X2 statistic is

close to a chi-square distribution with degrees of freedom (df) equal to the number of categories minus 1

x The mean of a particular chi-square distribution is equal to its degrees of freedom. x For df > 2, the mode (peak) of the chi-square density curve is at df – 2.

Example Problem: Calculating a P-value Using the information from the previous example problem (Jerome’s class), we computed the chi-square test statistic for the 60 M&M’s to be X2= 10.180.

1. Confirm that the expected counts are large enough to use a chi-square distribution to calculate the P-value. What degrees of freedom should you use?

2. Sketch a graph that shows the P-value.

3. Use your calculator’s X2cdf command to calculate a P-value.

4. What conclusion would you draw about the company’s claimed color distribution for M&M’s Milk Chocolate Candies?

all expectedcounts are greaterthansdf6 1 5

io.isalve

inTes a 6 8 io iss

X'cdf10.18 5 0.0703

Because our pvalue is greater than a0.05 we fail torejectHoWedonothaveconvincingevidence the company'sclaimeddistribution is incorrect

Page 3: 11.1 Name: Chi-Square Tests for Goodness of Fit

Statistics – 11.1 Name: _______________ Chi-Square Tests for Goodness of Fit

Independent Practice: Expected Counts, Chi-Square Statistic, and P-values Mars, Inc., reports that their M&M’S® Peanut Chocolate Candies are produced according to the following color distribution: 23% each of blue and orange, 15% each of green and yellow, and 12% each of red and brown. Joey bought a randomly selected bag of Peanut Chocolate Candies and counted the colors of the candies in his sample: 12 blue, 7 orange, 13 green, 4 yellow, 8 red, and 2 brown. 1. State appropriate hypotheses for testing the company’s claim about the color distribution of

M&M’S Peanut Chocolate Candies. 2. Calculate the expected count for each color, assuming that the company’s claim is true. Show

your work. 3. Calculate the chi-square statistic for Joey’s sample. Show your work. 4. Confirm that the expected counts are large enough to use a chi-square distribution. Which

distribution (specify degrees of freedom) should we use? 5. Use your calculator’s X2cdf command to calculate a P-value. 6. What conclusion would you draw about the company’s claimed color distribution for

M&M’s Peanut Chocolate Candies? Justify your answer.

Ho P 0.23 f0.23Pgo.is pjo.is Pr0.12 PB o 12Ha atleast 2 ofthePi's areincorrect

Blue

n5.52552

2 4210.1587 7 10.58 436.97 14 6.92 183,555,25 18552510.586.9 6.9 5.52X 0.1906 t1.2114 t 53928 t 1.2188 t 1.1142 t 2.2446

X 11 3724

Allexpectedcounts in the table are 25df 6 I 5

X'Cdf 11.3724 00 5 0.0445

Because the pvalue0.0445 is lessthan a 0.05 we rejectHoWehaveconvincingevidence that thecolordistribution ofMgmPeanut

Page 4: 11.1 Name: Chi-Square Tests for Goodness of Fit

Statistics – 11.1 Name: _______________ Chi-Square Tests for Goodness of Fit

Conditions for Performing a Chi-Square Test for Goodness of Fit x Random – The data come from a well-designed random sample or randomized

experiment. o 10% – When sampling without replacement, check that the sample is less that

10% of the population. x Large Counts – All expected counts are at least 5.

Cautions to Consider: 1. The chi-square test statistic compares observed and expected counts. Don’t try to perform

calculations with the observed and expected proportions in each category. 2. When checking the Large Counts condition, be sure to examine the expected counts, not the

observed counts. Carrying Out a Test

Page 5: 11.1 Name: Chi-Square Tests for Goodness of Fit

Statistics – 11.1 Name: _______________ Chi-Square Tests for Goodness of Fit

Example Problem: Carrying Out a Test In his book Outliers, Malcolm Gladwell suggests that a hockey player’s birth month has a big influence on his chance to make it to the highest levels of the game. Specifically, since January 1 is the cut-off date for youth leagues in Canada (where many National Hockey League (NHL) players come from), players born in January will be competing against players up to 12 months younger. The older players tend to be bigger, stronger, and more coordinated and hence get more playing time, more coaching, and have a better chance of being successful. To see if birth date is related to success (judged by whether a player makes it into the NHL), a random sample of 80 National Hockey League players from a recent season was selected and their birthdays were recorded. There were 879 NHL players this season.

Do these data provide convincing evidence that the birthdays of NHL players are not uniformly distributed throughout the year? (Use the Four Step Process) State HoThebirthdaysof allNHLplayers are

evenlydistributedacross thefourquartersoftheyear

HaThebirthdaysof allNHLplayers are notevenlydistributedacross thefourquartersoftheyear2 0.05

PlainChisquaretestforgoodness of fit Dei df 3VRenato Datacamefroma randomsample teststatistictoCando 80islessthan10 of all ya 13220205122052 4620205 422052NHLplayersforthatseason 20targetIts 0 If birthdaysareevenlydistributed I 7.2 t o t 0.8 t 3.2

acrossthefourquartersthentheexpectedcourtsare all 80625320 11.2

P.IE X2cdf11.200 3 0.011Allare 25

ConcludeBecause our Pvalue0.011 is lessthan 2 0.05 werejectHoWehaveconvincingevidencethatBirthdaysofNHLplayers arenotevenlydistributedacross thefourquarters of theyear