chi-squared test of homogeneity

39
Chi-Squared Test of Homogeneity Are different populations the same across some characteristic?

Upload: aquila

Post on 04-Jan-2016

88 views

Category:

Documents


2 download

DESCRIPTION

Chi-Squared Test of Homogeneity. Are different populations the same across some characteristic?. c 2 test for homogeneity. Used with a single categorical variable from two (or more) independent samples Used to see if the two populations are the same ( homogeneous ) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chi-Squared Test of Homogeneity

Chi-Squared Test of Homogeneity

Are different populations the same across some

characteristic?

Page 2: Chi-Squared Test of Homogeneity

22 test for homogeneity test for homogeneity• Used with a single categoricalsingle categorical

variable from two (or more) two (or more) independent samplesindependent samples

• Used to see if the two populations are the same (homogeneous)

• Several groups but STILL ONE VARIABLE

Page 3: Chi-Squared Test of Homogeneity

Assumptions & Assumptions & formula remain the formula remain the

same!same!

exp

expobs 22

• Samples are from a random sampling

• All expected counts are greater than 5

Page 4: Chi-Squared Test of Homogeneity

Hypotheses – written in Hypotheses – written in wordswordsH0: the proportions for the two (or more) distributions are the sameHa: At least one of the proportions for the distributions is different

Be sure to write in context!

Page 5: Chi-Squared Test of Homogeneity

Expected Counts Expected Counts

•Assuming H0 is true,

totaltable

alcolumn tot totalrow counts expected

Page 6: Chi-Squared Test of Homogeneity

Degrees of freedomDegrees of freedom

)1c)(1(r df

Or cover up one row & one column & count the number of cells remaining!

Page 7: Chi-Squared Test of Homogeneity

Should Dentist Advertise?

• It may seem hard to believe but until the 1970’s most professional organizations prohibited their members from advertising. In 1977, the U.S. Supreme Court ruled that prohibiting doctors and lawyers from advertising violated their free speech rights.

• Why do you think professional organizations sought to prohibit their members from advertising?

Page 8: Chi-Squared Test of Homogeneity

Should Dentist Advertise?

• The paper “Should Dentist Advertise?” (J. of Advertising Research (June 1982): 33 – 38) compared the attitudes of consumers and dentists toward the advertising of dental services. Separate samples of 101 consumers and 124 dentists were asked to respond to the following statement: “I favor the use of advertising by dentists to attract new patients.”

Page 9: Chi-Squared Test of Homogeneity

Should Dentist Advertise?

• Possible responses were: strongly agree, agree, neutral, disagree, strongly disagree.

• The authors were interested in determining whether the two groups—dentists and consumers—differed in their attitudes toward advertising.

Page 10: Chi-Squared Test of Homogeneity

Should Dentist Advertise?

• This is a done by a chi-squared test of homogeneity, that is we are testing the claim that different populations have the same ratio across some second variable characteristic.

• So how should we state the null and alternative hypotheses for this test?

Page 11: Chi-Squared Test of Homogeneity

Should Dentist Advertise?

• H0:

• Ha:

The true category proportions for all responses are the same for both populations of consumers and dentists.

The true category proportions for all responses are not the same for both populations of consumers and dentists.

Page 12: Chi-Squared Test of Homogeneity

Observed Data

• How do we determine the expected cell count under the assumption of homogeneity?

• The expected cell counts are estimated from the sample data (assuming that H0 is true) by using …

expected row marginal total column marginal total

cell count the total sample size

Agree Neutral DisagreeConsumers 34 49 9 4 5

Dentists 9 18 23 28 46

43 67 32 32 51

Strongly Agree

Strongly Disagree

ResponseGroup

101

124

Page 13: Chi-Squared Test of Homogeneity

Expected Values

Agree Neutral DisagreeConsumers 34 49 9 4 5

Dentists 9 18 23 28 46

43 67 32 32 51

Strongly Agree

Strongly Disagree

ResponseGroup

• So the calculation for the first cell is …

st 101 431 expected19.302

225cell count

19.30101

124

Page 14: Chi-Squared Test of Homogeneity

Observed Data

Agree Neutral DisagreeConsumers 34 49 9 4 5

Dentists 9 18 23 28 46

43 67 32 32 51

Strongly Agree

Strongly Disagree

ResponseGroup

• Students on the right side of the classroom finish the first row and the left side find the expected values for the dentists.

19.30

23.70

30.08

36.92 17.64

14.36

28.11

14.36 22.89

17.64

Page 15: Chi-Squared Test of Homogeneity

Conditions

• So now we can consider the conditions of our analysis.– We will assume the data was randomly

selected.– The sample was large enough because

every cell in the contingency table had an expected frequency of at least 5.

Page 16: Chi-Squared Test of Homogeneity

Test Statistic

• Now we can calculate the 2 test statistic:

2

2 Observed Count Expected Count

Expected Count

2 2 234 19.30 49 30.08 46 28.11

...19.30 30.08 28.11

11.20 11.90 2.00 ... 11.39 84.47

Page 17: Chi-Squared Test of Homogeneity

Sampling DistributionThe two-way table for this situation has 2 rows and 5 columns, so the appropriate degrees of freedom is (2 – 1)(5 – 1) = 4. 2 (84.5, , 4) 0cdf

Since the likelihood of seeing such a large amount of difference between the observed frequencies and what we would expected to have seen if the two populations were homogeneous is so small (approx 0), there is strong evidence against the assumption that the proportions in the response categories are the same for the populations of consumers and dentists.

Page 18: Chi-Squared Test of Homogeneity

Post-graduation activities of graduates from an upstate NY high school

1980 1990 2000 Total

College/post HS

education

320 245 288 853

Employment 98 24 17 139

Military 18 19 5 42

Travel 17 2 5 24

Total 453 290 315 1058

Have what kids do after graduation changed across three graduating classes?

Page 19: Chi-Squared Test of Homogeneity

Could test whether two proportions are the same using a two-proportion z test…. but we have 3 groups.

Chi-square goodness-of-fit tests against given proportions (theoretical models) …. but we want to know if choices have changed.

So… we’ll use a chi-square test of homogeneity. Homogeneity means that things are the same so we have a built-in null hypothesis – the distribution does not change from group to group. This test looks for differences too large from what we might expect from random sample-to-sample variation.

Page 20: Chi-Squared Test of Homogeneity

1980 1990 2000 Total

College/post HS

education

320

(365.2)

245

(233.8)

288

(253.9)

853

Employment 98

(59.5)

24

(38.1)

17

(41.4)

139

Military 18

(17.98)

19

(11.5)

5

(12.5)

42

Travel 17

(10.3)

2

(6.6)

5

(7.1)

24

Total 453 290 315 1058

(row marginal total)(column marginal total)expected cell count=

grand total

Page 21: Chi-Squared Test of Homogeneity

Ho: The post-high school choices made by classes of 1980, 1990, 2000 have the same distributions

Ha: The post-high school choices made by classes of 1980, 1990, 2000 do not have the same distributions

Conditions: * categorical data with counts* expected values are all at least 5

Degrees of freedom: (R – 1)(C – 1) = 3 * 2 = 6

22 (observed cell count - expected cell count)expected cell count

Test statistic:

Page 22: Chi-Squared Test of Homogeneity

22 (observed cell count - expected cell count)expected cell count = 72.77

P-value = P(x2 > 72.77) < 0.0001

The P-value is very small, so I reject the null hypothesis and conclude there is evidence that the choices made by high-school graduates have changed over the three classes examined.

Page 23: Chi-Squared Test of Homogeneity

When we reject the null hypothesis, it’s a good idea to examine residuals. To standardize the residuals:

(Observed - expected)

expected

1980 1990 2000

College/post HS

education

-2.366 .732 2.136

Employment 4.989 -2.284 -3.791

Military .004 2.207 -2.122

Travel 2.098 -1.785 -.803

What can this show us?

Page 24: Chi-Squared Test of Homogeneity

The following data is on drinking behavior for independently chosen random samples of male and female students. Does there appear to be a gender difference with respect to drinking behavior? (Note: low = 1-7 drinks/wk, moderate = 8-24 drinks/wk, high = 25 or more drinks/wk)

Page 25: Chi-Squared Test of Homogeneity

Assumptions:

•Have 2 random sample of students

•All expected counts are greater than 5.

H0: the proportions of drinking behaviors is the same

for female & male students Ha: at least one of the proportions of

drinking behavior is different for female & male students

P-value = .000 df = 3 = .05

Since p-value < , I reject H0. There is sufficient evidence to suggest that drinking behavior is not the same for female & male students.

53.96...

4.1674.167186

6.1586.158140 22

2

Expected Counts:

M F

0 158.6 167.4

L 554.0 585.0

M 230.1 243.0

H 38.4 40.6

Page 26: Chi-Squared Test of Homogeneity

2 test for Independence•Used with categorical, bivariate data from ONE sample

•Used to see if the two categorical variables are associated (dependent) or not associated (independent)

•One sample but two variables

Page 27: Chi-Squared Test of Homogeneity

Hypotheses – written in Hypotheses – written in wordswordsH0: two variables are independent

Ha: two variables are dependent

Be sure to write in context!

Page 28: Chi-Squared Test of Homogeneity

Assumptions & formula remain the same!

Expected counts & df are found the same way as test for homogeneity.

OnlyOnly change is the hypotheses!

Page 29: Chi-Squared Test of Homogeneity

A study from the University of Texas Southwestern Medical Center examined whether the risk of hepatitis C was related to whether people had tattoos and to where they got their tattoos.

Hepatitis C No Hepatitis C Total

Tattoo, parlor 17 35 52

Tattoo, elsewhere

8 53 61

None 22 491 513

Total 47 579 626

Data differs from other kinds because they categorize subjects from a single group on two categorical variable rather than on only one.

Page 30: Chi-Squared Test of Homogeneity

Is the chance of having hepatitis C independent of tattoo status?

If hepatitis status is independent of tattoos, we expect the proportion of people testing positive for hepatitis to be the same for the three levels of tattoo status.

Are the categorical variables tattoo status and hepatitis statistically independent?

A chi-square test for independence

Page 31: Chi-Squared Test of Homogeneity

Ho: Tattoo status and hepatitis status are independent

Ha: Tattoo status and hepatitis status are not independent

Conditions: * categorical data with counts* expected values are all at least 5

Degrees of freedom: (R – 1)(C – 1) = 2 * 1 = 2

22 (observed cell count - expected cell count)expected cell count

Test statistic:

Page 32: Chi-Squared Test of Homogeneity

Hepatitis C No Hepatitis C Total

Tattoo, parlor 17

(3.904)

35

(48.096)

52

Tattoo, elsewhere

8

(4.580)

53

(56.420)

61

None 22

(38.516)

491

(474.484)

513

Total 47 579 626

22 (observed cell count - expected cell count)expected cell count = 57.91

P-value = P(x2 > 57.91) < 0.0001

Page 33: Chi-Squared Test of Homogeneity

The p-value is very small, so I reject the null hypothesis and conclude that hepatitis status is not independent of tattoo status. Because the expected cell frequency condition was violated, I need to check that the two cells with small expected counts did not influence this result too greatly.

Page 34: Chi-Squared Test of Homogeneity

Whenever we reject the null hypothesis, it’s a good idea to examine the residuals.

Since counts may be different for cells, we are better off standardizing the residuals.

To standardize a cell’s residuals, divide by the square root of its expected value.

( )Obs Expresidual

Exp

The + and the – sign indicate whether we observed more cases than we expected, or fewer.

Page 35: Chi-Squared Test of Homogeneity

Hepatitis C No Hepatitis C

Tattoo, parlor 6.628 -1.888

Tattoo, elsewhere

1.598 -.455

None -2.661 .758

Examining the residuals:largest component: Hepatitis C/Tattoo parlor –

suggest that a principal source of infection may be tattoo parlors

second largest component: Hepatitis C/no tattoo – those who have no tattoos are less likely to be infected with hepatitis C than we might expect if the two variables are independent

Page 36: Chi-Squared Test of Homogeneity

A beef distributor wishes to determine whether there is a relationship between geographic region and cut of meat preferred. If there is no relationship, we will say that beef preference is independent of geographic region. Suppose that, in a random sample of 500 customers, 300 are from the North and 200 from the South. Also, 150 prefer cut A, 275 prefer cut B, and 75 prefer cut C.

Page 37: Chi-Squared Test of Homogeneity

If beef preference is independent of geographic region, how would we expect this table to be filled in?

North South Total

Cut A 150

Cut B 275

Cut C 75

Total 300 200 500

90 60

165

110

45 30

Page 38: Chi-Squared Test of Homogeneity

Now suppose that in the actual sample of 500 consumers the observed numbers were as follows:

  (on your paper)

 Is there sufficient evidence to suggest that geographic regions and beef preference are not independent? (Is there a difference between the expected and observed counts?)

Page 39: Chi-Squared Test of Homogeneity

Assumptions:

•Have a random sample of people

•All expected counts are greater than 5.

H0: geographic region and beef preference are

independent Ha: geographic region and beef

preference are dependent

P-value = .0226 df = 2 = .05

Since p-value < , I reject H0. There is sufficient evidence to suggest that geographic region and beef preference are dependent.

576.7...

606050

9090100 22

2

Expected Counts:

N S

A 90 60

B 165 110

C 45 30