copyright © 2014, 2011 pearson education, inc. 1 chapter 18 inference for counts
TRANSCRIPT
Copyright © 2014, 2011 Pearson Education, Inc. 1
Chapter 18Inference for Counts
Copyright © 2014, 2011 Pearson Education, Inc. 2
18.1 Chi-Squared Tests
Retailers can customize the online shopping experience by learning more about its customers. For example, Amazon wants to know if income level affects what shoppers look for (camera or phone) when they visit electronics.
Use a chi-squared test for independence to answer this question.
Copyright © 2014, 2011 Pearson Education, Inc. 3
18.1 Chi-Squared Tests
Contingency Table: Purchase Category vs. Household Income (555 visitors to Amazon)
Copyright © 2014, 2011 Pearson Education, Inc. 4
18.1 Chi-Squared Tests
Observations from Contingency Table
Association is evident suggesting that income and choice of product are dependent.
Households with lower incomes seem more likely to purchase a phone; those with higher incomes a camera.
Are these differences in purchase rates the result of sampling variation?
Copyright © 2014, 2011 Pearson Education, Inc. 5
18.2 Test of Independence
Chi-Squared test of independence
Tests the independence of two categorical variables using counts in a contingency table.
Copyright © 2014, 2011 Pearson Education, Inc. 6
18.2 Test of Independence
Hypotheses for the chi-squared testH0: Household Income and Purchase
Category are independent.Ha: Household Income and Purchase
Category are not independent.
OrH0: p25 = p50 = p75 = p100 = p100+
Ha: p25 , p50 , p75 , p100 , p100+ are not all equal
Copyright © 2014, 2011 Pearson Education, Inc. 7
18.2 Test of Independence
Hypotheses for the chi-squared test
Null hypothesis describes five segments of the population defined by household income.
Null assumes conditional probabilities of purchase type given income level are equal across the five segments.
Alternative hypothesis is vague; does not indicate why the null is false.
Copyright © 2014, 2011 Pearson Education, Inc. 8
18.2 Test of Independence
Calculating χ2
Measures the distance between the observed contingency table and a hypothetical contingency table.
The hypothetical contingency table obeys H0 while being consistent with observed marginal counts.
Copyright © 2014, 2011 Pearson Education, Inc. 9
18.2 Test of Independence
Calculating χ2
The null hypothesis determines expected cell counts in the
hypothetical table.
Copyright © 2014, 2011 Pearson Education, Inc. 10
18.2 Test of Independence
Calculating χ2
Accumulates the deviations between the observed and expected counts (in the hypothetical table) across all cells.
ected
ectedobserved
exp
)exp( 22
Copyright © 2014, 2011 Pearson Education, Inc. 11
18.2 Test of Independence
Calculating χ2
For retail data on purchase category and household income, the chi-squared statistic is 33.925.
925.3316.57
)16.5753(...
74.50
)74.5038(
61.43
)61.4326( 2222
Copyright © 2014, 2011 Pearson Education, Inc. 12
18.2 Test of Independence
Plots of the chi-squared testMosaic Plot for Retail Data
Copyright © 2014, 2011 Pearson Education, Inc. 13
18.2 Test of Independence
Plots of the chi-squared testMosaic Plot for Independent Variables
Copyright © 2014, 2011 Pearson Education, Inc. 14
18.2 Test of Independence
Conditions
No lurking explanation for association. Data are random samples from indicated
segments of the population. Categories defining the table are mutually
exclusive. Expected cell counts are not too small.
Copyright © 2014, 2011 Pearson Education, Inc. 15
18.2 Test of Independence
The chi-squared distribution
Sampling distribution of the chi-squared statistic if the null hypothesis is true.
Right-skewed. Assigns probabilities to positive values only. Identified by degrees of freedom (df). Approaches normal distribution as df increase.
Copyright © 2014, 2011 Pearson Education, Inc. 16
18.2 Test of Independence
The chi-squared distribution
Copyright © 2014, 2011 Pearson Education, Inc. 17
18.2 Test of Independence
Getting the p-value
df for χ2 test of independence = (r - 1)(c - 1)
df based on size of contingency tabler = number of rowsc = number of columns
Copyright © 2014, 2011 Pearson Education, Inc. 18
18.2 Test of Independence
Getting the p-value – Retail Example
Observed χ2 = 33.925 with 4 dfFrom χ2 table P(χ2 > 9.4877) = 0.05; since 33.925 > 9.4877, we can reject H0
The p-value is therefore < 0.05; the exact p-value is 0.0000008.
Copyright © 2014, 2011 Pearson Education, Inc. 19
18.2 Test of Independence
Getting the p-value – Retail Example
Copyright © 2014, 2011 Pearson Education, Inc. 20
18.2 Test of Independence
Summary: chi-squared test of independence
Copyright © 2014, 2011 Pearson Education, Inc. 21
18.2 Test of Independence
Chi-squared test of independence – Checklist
No obvious lurking variable. SRS Condition. Contingency table condition. Sample size condition. Expected cell
frequencies at least 10; expected cell frequencies of 5 permitted with at least 4 df.
Copyright © 2014, 2011 Pearson Education, Inc. 22
18.2 Test of Independence
Connection to two-sample tests
Chi-squared test reduces to two-sided version of the two-sample test of the difference between proportions.
If the 95% confidence interval for p1 – p2
does not include zero, then the chi-squared test has a p-value less than 0.05 and H0 is rejected.
Copyright © 2014, 2011 Pearson Education, Inc. 23
4M Example 18.1: RETAIL CREDIT
Motivation
Managers of a chain worry that some methods of recruiting customers for store credit, called channels, produce more problems than other channels. Is the channel used related to the status of the customer’s account a year later?
Copyright © 2014, 2011 Pearson Education, Inc. 24
4M Example 18.1: RETAIL CREDIT
MethodData collected for 630 accounts on variables Channel and Status after 12 months.
Copyright © 2014, 2011 Pearson Education, Inc. 25
4M Example 18.1: RETAIL CREDIT
Method – Check Conditions
No obvious lurking variable. Difficult to check without knowing more about channels.
SRS condition reasonably met. Contingency table condition satisfied. Sample size condition must be checked
after computing expected cell frequencies.
Copyright © 2014, 2011 Pearson Education, Inc. 26
4M Example 18.1: RETAIL CREDIT
Mechanics – Mosaic Plot
Copyright © 2014, 2011 Pearson Education, Inc. 27
4M Example 18.1: RETAIL CREDIT
Mechanics – Expected Counts
Sample size condition satisfied.Χ2 = 9.158 with 4 df; p-value = 0.057.Cannot reject H0 at α = 0.05.
Copyright © 2014, 2011 Pearson Education, Inc. 28
4M Example 18.1: RETAIL CREDIT
Message
Observed rates of late payments and early closure are not statistically significantly different among credit accounts opened a year ago through in-store, mailing and Web channels. Since the p-value is close to 0.05, it may be worthwhile to monitor accounts developed through mailings.
Copyright © 2014, 2011 Pearson Education, Inc. 29
18.3 General Versus Specific Hypotheses
Chi-squared test cannot match the power of a more specific test.
A 95% confidence interval for the difference in proportions of late payments from accounts developed via the mailing channel versus the other two channels (combined into one) does not contain zero.
Copyright © 2014, 2011 Pearson Education, Inc. 30
18.4 Tests of Goodness of Fit
Chi-Squared test of goodness of fit
A test of the distribution of a single categorical variable.
Copyright © 2014, 2011 Pearson Education, Inc. 31
18.4 Tests of Goodness of Fit
Testing for randomness
Do shoppers purchase big-ticket items more often on some days of the week than on others?
Are cars made on some days more likely to have defects than the cars made on other days?
Copyright © 2014, 2011 Pearson Education, Inc. 32
4M Example 18.2: DETECTING ACCOUNTING FRAUD
Motivation
Managers would like to have a systematic method to audit purchase amounts on invoices to uncover fraud.
Copyright © 2014, 2011 Pearson Education, Inc. 33
4M Example 18.2: DETECTING ACCOUNTING FRAUD
Method
Managers collected a sample of n = 135 invoices. Amounts ranged from $100 to $100,000, with an average of $42,000. Leading digits for the amounts should follow a distribution known as Benford’s law.
Copyright © 2014, 2011 Pearson Education, Inc. 34
4M Example 18.2: DETECTING ACCOUNTING FRAUD
Method Probabilities based on Benford’s law
Copyright © 2014, 2011 Pearson Education, Inc. 35
4M Example 18.2: DETECTING ACCOUNTING FRAUD
MethodCounts of leading digits in sample of invoices
Copyright © 2014, 2011 Pearson Education, Inc. 36
4M Example 18.2: DETECTING ACCOUNTING FRAUD
Method – Check Conditions
All conditions are satisfied. The smallest expected count is 6.2. Because there are more than 4 degrees of freedom, the relaxed sample size condition is used.
Copyright © 2014, 2011 Pearson Education, Inc. 37
4M Example 18.2: DETECTING ACCOUNTING FRAUD
Mechanics
Χ2 = 19.1 with 8 df. P-value = 0.014. Reject H0.
Copyright © 2014, 2011 Pearson Education, Inc. 38
4M Example 18.2: DETECTING ACCOUNTING FRAUD
Message
The deviation of the distribution of leading digits in these invoice amounts is statistically significantly different from the form predicted by Benford’s law. This confirms suspicion that the digits are atypical and may indicate fraud.
Copyright © 2014, 2011 Pearson Education, Inc. 39
18.4 Tests of Goodness of Fit
Testing the fit of a probability model
How do we know whether the observed counts match a particular distribution?
Copyright © 2014, 2011 Pearson Education, Inc. 40
4M Example 18.3: WEB HITS
Motivation
Managers of the Web site plan to use a Poisson model to summarize how often users click on ads. If it fits well, they will use this model to summarize concisely the volume of traffic headed to advertisers and to measure the effects of changes in the Web site on traffic patterns.
Copyright © 2014, 2011 Pearson Education, Inc. 41
4M Example 18.3: WEB HITS
MethodData collected on a sample of 685 users that visited the Web site during a recent weekday evening.
Copyright © 2014, 2011 Pearson Education, Inc. 42
4M Example 18.3: WEB HITS
Method – Check Conditions
SRS and the contingency table conditions are satisfied. However, need to combine the last three categories in order to meet the sample size condition.
Copyright © 2014, 2011 Pearson Education, Inc. 43
4M Example 18.3: WEB HITS
Mechanics
Copyright © 2014, 2011 Pearson Education, Inc. 44
4M Example 18.3: WEB HITS
Mechanics
Χ2 = 0.345 with 2 df. P-value = 0.84. Cannot reject H0.
Copyright © 2014, 2011 Pearson Education, Inc. 45
4M Example 18.3: WEB HITS
Message
The distribution of the number of ads clicked by users is consistent with a Poisson distribution. Managers of the Web site can use this model to summarize user behavior.
Copyright © 2014, 2011 Pearson Education, Inc. 46
Best Practices
Remember the importance of experiments.
State your hypotheses before looking at the data.
Plot the data.
Think when you interpret a p-value.
Copyright © 2014, 2011 Pearson Education, Inc. 47
Pitfalls
Don’t confuse statistical significance with substantive significance.
Don’t use a chi-squared test when the expected frequencies are too small.
Don’t cherry pick comparisons.
Don’t use the number of observations to find the degrees of freedom of chi-squared.