copyright © 2014, 2011 pearson education, inc. 1 chapter 18 inference for counts

47
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inferenc e for Counts

Upload: franklin-mathews

Post on 03-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 1

Chapter 18Inference for Counts

Page 2: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 2

18.1 Chi-Squared Tests

Retailers can customize the online shopping experience by learning more about its customers. For example, Amazon wants to know if income level affects what shoppers look for (camera or phone) when they visit electronics.

Use a chi-squared test for independence to answer this question.

Page 3: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 3

18.1 Chi-Squared Tests

Contingency Table: Purchase Category vs. Household Income (555 visitors to Amazon)

Page 4: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 4

18.1 Chi-Squared Tests

Observations from Contingency Table

Association is evident suggesting that income and choice of product are dependent.

Households with lower incomes seem more likely to purchase a phone; those with higher incomes a camera.

Are these differences in purchase rates the result of sampling variation?

Page 5: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 5

18.2 Test of Independence

Chi-Squared test of independence

Tests the independence of two categorical variables using counts in a contingency table.

Page 6: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 6

18.2 Test of Independence

Hypotheses for the chi-squared testH0: Household Income and Purchase

Category are independent.Ha: Household Income and Purchase

Category are not independent.

OrH0: p25 = p50 = p75 = p100 = p100+

Ha: p25 , p50 , p75 , p100 , p100+ are not all equal

Page 7: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 7

18.2 Test of Independence

Hypotheses for the chi-squared test

Null hypothesis describes five segments of the population defined by household income.

Null assumes conditional probabilities of purchase type given income level are equal across the five segments.

Alternative hypothesis is vague; does not indicate why the null is false.

Page 8: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 8

18.2 Test of Independence

Calculating χ2

Measures the distance between the observed contingency table and a hypothetical contingency table.

The hypothetical contingency table obeys H0 while being consistent with observed marginal counts.

Page 9: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 9

18.2 Test of Independence

Calculating χ2

The null hypothesis determines expected cell counts in the

hypothetical table.

Page 10: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 10

18.2 Test of Independence

Calculating χ2

Accumulates the deviations between the observed and expected counts (in the hypothetical table) across all cells.

ected

ectedobserved

exp

)exp( 22

Page 11: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 11

18.2 Test of Independence

Calculating χ2

For retail data on purchase category and household income, the chi-squared statistic is 33.925.

925.3316.57

)16.5753(...

74.50

)74.5038(

61.43

)61.4326( 2222

Page 12: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 12

18.2 Test of Independence

Plots of the chi-squared testMosaic Plot for Retail Data

Page 13: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 13

18.2 Test of Independence

Plots of the chi-squared testMosaic Plot for Independent Variables

Page 14: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 14

18.2 Test of Independence

Conditions

No lurking explanation for association. Data are random samples from indicated

segments of the population. Categories defining the table are mutually

exclusive. Expected cell counts are not too small.

Page 15: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 15

18.2 Test of Independence

The chi-squared distribution

Sampling distribution of the chi-squared statistic if the null hypothesis is true.

Right-skewed. Assigns probabilities to positive values only. Identified by degrees of freedom (df). Approaches normal distribution as df increase.

Page 16: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 16

18.2 Test of Independence

The chi-squared distribution

Page 17: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 17

18.2 Test of Independence

Getting the p-value

df for χ2 test of independence = (r - 1)(c - 1)

df based on size of contingency tabler = number of rowsc = number of columns

Page 18: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 18

18.2 Test of Independence

Getting the p-value – Retail Example

Observed χ2 = 33.925 with 4 dfFrom χ2 table P(χ2 > 9.4877) = 0.05; since 33.925 > 9.4877, we can reject H0

The p-value is therefore < 0.05; the exact p-value is 0.0000008.

Page 19: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 19

18.2 Test of Independence

Getting the p-value – Retail Example

Page 20: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 20

18.2 Test of Independence

Summary: chi-squared test of independence

Page 21: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 21

18.2 Test of Independence

Chi-squared test of independence – Checklist

No obvious lurking variable. SRS Condition. Contingency table condition. Sample size condition. Expected cell

frequencies at least 10; expected cell frequencies of 5 permitted with at least 4 df.

Page 22: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 22

18.2 Test of Independence

Connection to two-sample tests

Chi-squared test reduces to two-sided version of the two-sample test of the difference between proportions.

If the 95% confidence interval for p1 – p2

does not include zero, then the chi-squared test has a p-value less than 0.05 and H0 is rejected.

Page 23: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 23

4M Example 18.1: RETAIL CREDIT

Motivation

Managers of a chain worry that some methods of recruiting customers for store credit, called channels, produce more problems than other channels. Is the channel used related to the status of the customer’s account a year later?

Page 24: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 24

4M Example 18.1: RETAIL CREDIT

MethodData collected for 630 accounts on variables Channel and Status after 12 months.

Page 25: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 25

4M Example 18.1: RETAIL CREDIT

Method – Check Conditions

No obvious lurking variable. Difficult to check without knowing more about channels.

SRS condition reasonably met. Contingency table condition satisfied. Sample size condition must be checked

after computing expected cell frequencies.

Page 26: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 26

4M Example 18.1: RETAIL CREDIT

Mechanics – Mosaic Plot

Page 27: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 27

4M Example 18.1: RETAIL CREDIT

Mechanics – Expected Counts

Sample size condition satisfied.Χ2 = 9.158 with 4 df; p-value = 0.057.Cannot reject H0 at α = 0.05.

Page 28: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 28

4M Example 18.1: RETAIL CREDIT

Message

Observed rates of late payments and early closure are not statistically significantly different among credit accounts opened a year ago through in-store, mailing and Web channels. Since the p-value is close to 0.05, it may be worthwhile to monitor accounts developed through mailings.

Page 29: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 29

18.3 General Versus Specific Hypotheses

Chi-squared test cannot match the power of a more specific test.

A 95% confidence interval for the difference in proportions of late payments from accounts developed via the mailing channel versus the other two channels (combined into one) does not contain zero.

Page 30: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 30

18.4 Tests of Goodness of Fit

Chi-Squared test of goodness of fit

A test of the distribution of a single categorical variable.

Page 31: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 31

18.4 Tests of Goodness of Fit

Testing for randomness

Do shoppers purchase big-ticket items more often on some days of the week than on others?

Are cars made on some days more likely to have defects than the cars made on other days?

Page 32: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 32

4M Example 18.2: DETECTING ACCOUNTING FRAUD

Motivation

Managers would like to have a systematic method to audit purchase amounts on invoices to uncover fraud.

Page 33: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 33

4M Example 18.2: DETECTING ACCOUNTING FRAUD

Method

Managers collected a sample of n = 135 invoices. Amounts ranged from $100 to $100,000, with an average of $42,000. Leading digits for the amounts should follow a distribution known as Benford’s law.

Page 34: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 34

4M Example 18.2: DETECTING ACCOUNTING FRAUD

Method Probabilities based on Benford’s law

Page 35: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 35

4M Example 18.2: DETECTING ACCOUNTING FRAUD

MethodCounts of leading digits in sample of invoices

Page 36: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 36

4M Example 18.2: DETECTING ACCOUNTING FRAUD

Method – Check Conditions

All conditions are satisfied. The smallest expected count is 6.2. Because there are more than 4 degrees of freedom, the relaxed sample size condition is used.

Page 37: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 37

4M Example 18.2: DETECTING ACCOUNTING FRAUD

Mechanics

Χ2 = 19.1 with 8 df. P-value = 0.014. Reject H0.

Page 38: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 38

4M Example 18.2: DETECTING ACCOUNTING FRAUD

Message

The deviation of the distribution of leading digits in these invoice amounts is statistically significantly different from the form predicted by Benford’s law. This confirms suspicion that the digits are atypical and may indicate fraud.

Page 39: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 39

18.4 Tests of Goodness of Fit

Testing the fit of a probability model

How do we know whether the observed counts match a particular distribution?

Page 40: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 40

4M Example 18.3: WEB HITS

Motivation

Managers of the Web site plan to use a Poisson model to summarize how often users click on ads. If it fits well, they will use this model to summarize concisely the volume of traffic headed to advertisers and to measure the effects of changes in the Web site on traffic patterns.

Page 41: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 41

4M Example 18.3: WEB HITS

MethodData collected on a sample of 685 users that visited the Web site during a recent weekday evening.

Page 42: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 42

4M Example 18.3: WEB HITS

Method – Check Conditions

SRS and the contingency table conditions are satisfied. However, need to combine the last three categories in order to meet the sample size condition.

Page 43: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 43

4M Example 18.3: WEB HITS

Mechanics

Page 44: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 44

4M Example 18.3: WEB HITS

Mechanics

Χ2 = 0.345 with 2 df. P-value = 0.84. Cannot reject H0.

Page 45: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 45

4M Example 18.3: WEB HITS

Message

The distribution of the number of ads clicked by users is consistent with a Poisson distribution. Managers of the Web site can use this model to summarize user behavior.

Page 46: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 46

Best Practices

Remember the importance of experiments.

State your hypotheses before looking at the data.

Plot the data.

Think when you interpret a p-value.

Page 47: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts

Copyright © 2014, 2011 Pearson Education, Inc. 47

Pitfalls

Don’t confuse statistical significance with substantive significance.

Don’t use a chi-squared test when the expected frequencies are too small.

Don’t cherry pick comparisons.

Don’t use the number of observations to find the degrees of freedom of chi-squared.