chapter 6virtual.yosemite.cc.ca.us/jcurl/math134 4 s/ch6-149-166.pdf · p1: pbu/ovy p2: pbu/ovy qc:...

18
CHAPTER 6 In this chapter we cover... Marginal distributions Conditional distributions Simpson’s paradox Royalty-Free/CORBIS Two-Way Tables We have concentrated on relationships in which at least the response variable is quantitative. Now we will describe relationships between two categorical vari- ables. Some variables—such as sex, race, and occupation—are categorical by na- ture. Other categorical variables are created by grouping values of a quantitative variable into classes. Published data often appear in grouped form to save space. To analyze categorical data, we use the counts or percents of individuals that fall into various categories. This material is important in statistics, but it is needed later in this book only for Chapter 23. You may omit it if you do not plan to read Chapter 23 or delay reading it until you reach Chapter 23. 149

Upload: lamcong

Post on 19-Aug-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

CH

AP

TE

R

6In this chapter we cover...

Marginal distributions

Conditional distributions

Simpson’s paradox

Roy

alty

-Fre

e/C

OR

BIS

Two-Way Tables∗

We have concentrated on relationships in which at least the response variableis quantitative. Now we will describe relationships between two categorical vari-ables. Some variables—such as sex, race, and occupation—are categorical by na-ture. Other categorical variables are created by grouping values of a quantitativevariable into classes. Published data often appear in grouped form to save space.To analyze categorical data, we use the counts or percents of individuals that fallinto various categories.

∗This material is important in statistics, but it is needed later in this book only for Chapter 23. You mayomit it if you do not plan to read Chapter 23 or delay reading it until you reach Chapter 23.

149

Page 2: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 19, 2006 9:27

150 C H A P T E R 6 • Two-Way Tables

T A B L E 6 . 1 College students by sex and age group, 2003(thousands of persons)

Sex

Age group Female Male Total

15 to 17 years 89 61 15018 to 24 years 5,668 4,697 10,36525 to 34 years 1,904 1,589 3,49435 years or older 1,660 970 2,630

Total 9,321 7,317 16,639

E X A M P L E 6 . 1 College students

Table 6.1 presents Census Bureau data describing the age and sex of college students.1

This is a two-way table because it describes two categorical variables. (Age is categoricaltwo-way tablehere because the students are grouped into age categories.) Age group is the row variablebecause each row in the table describes students in one age group. Because age group

row and column variables

has a natural order from youngest to oldest, the order of the rows reflects this order. Sexis the column variable because each column describes one sex. The entries in the tableare the counts of students in each age-by-sex class.

Marginal distributionsHow can we best grasp the information contained in Table 6.1? First, look at thedistribution of each variable separately. The distribution of a categorical variable sayshow often each outcome occurred. The “Total” column at the right of the tablecontains the totals for each of the rows. These row totals give the distribution ofage (the row variable) among college students: 150,000 were 15 to 17 years old,10,365,000 were 18 to 24 years old, and so on. In the same way, the “Total” rowat the bottom of the table gives the distribution of sex. The bottom row reveals astriking and important fact: women outnumber men among college students.

If the row and column totals are missing, the first thing to do in studying atwo-way table is to calculate them. The distributions of sex alone and age alone arecalled marginal distributions because they appear at the right and bottom marginsmarginal distributionof the two-way table.

If you check the row and column totals in Table 6.1, you will notice a fewdiscrepancies. For example, the sum of the entries in the “25 to 34” row is 3493.The entry in the “Total”column for that row is 3494. The explanation is roundoffroundoff errorerror. The table entries are in thousands of students and each is rounded to thenearest thousand. The Census Bureau obtained the “Total” entry by roundingthe exact number of students aged 25 to 34 to the nearest thousand. The resultwas 3,494,000. Adding the row entries, each of which is already rounded, gives aslightly different result.

Page 3: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

Marginal distributions 151

Percents are often more informative than counts. We can display the marginaldistribution of students’ age groups in terms of percents by dividing each row totalby the table total and converting to a percent.

E X A M P L E 6 . 2 Calculating a marginal distribution

The percent of college students who are 18 to 24 years old is

age 18 to 24 totaltable total

= 10,36516,639

= 0.623 = 62.3%

Are you surprised that only about 62% of students are in the traditional college agegroup? Do three more such calculations to obtain the marginal distribution of age groupin percents. Here it is:

15 to 17 18 to 24 25 to 34 35 or older

Percent of college students aged 0.9 62.3 21.0 15.8

The total is 100% because everyone is in one of the four age categories.

Each marginal distribution from a two-way table is a distribution for a singlecategorical variable. As we saw in Chapter 1, we can use a bar graph or a pie chartto display such a distribution. Figure 6.1 is a bar graph of the distribution of age forcollege students.

010

2030

4050

6070

Per

cen

t o

f co

lleg

e st

ud

ents

15 to 17 18 to 24 25 to 34 35 or older

Age group (years)

62.3% of collegestudents are in the18 to 24 years agegroup.

F I G U R E 6 . 1 A bar graph of the distribution of age for college students.This is one of the marginal distributions for Table 6.1.

Page 4: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

152 C H A P T E R 6 • Two-Way Tables

In working with two-way tables, you must calculate lots of percents. Here’s atip to help decide what fraction gives the percent you want. Ask, “What grouprepresents the total that I want a percent of?” The count for that group is thedenominator of the fraction that leads to the percent. In Example 6.2, we want apercent “of college students,” so the count of college students (the table total) isthe denominator.

Royalty-Free/CORBIS

A P P L Y Y O U R K N O W L E D G E

6.1 Risks of playing soccer. A study in Sweden looked at former elite soccerplayers, people who had played soccer but not at the elite level, and people of thesame age who did not play soccer. Here is a two-way table that classifies thesesubjects by whether or not they had arthritis of the hip or knee by theirmidfifties:2

Elite Non-elite Did not play

Arthritis 10 9 24No arthritis 61 206 548

(a) How many people do these data describe?

(b) How many of these people have arthritis of the hip or knee?

(c) Give the marginal distribution of participation in soccer, both as counts andas percents.

6.2 Deaths. Here is a two-way table of number of deaths in the United States inthree age groups from selected causes in 2003. The entries are counts of deaths.3

Because many deaths are due to other causes, the entries don’t add to the “Totaldeaths” count. The total deaths in the three age groups are very different, so it isimportant to use percents rather than counts in comparing the age groups.

15 to 24 years 25 to 44 years 45 to 64 years

Accidents 14,966 27,844 23,669AIDS 171 6,879 5,917Cancer 1,628 19,041 144,936Heart diseases 1,083 16,283 101,713Homicide 5,148 7,367 2,756Suicide 3,921 11,251 10,057

Total deaths 33,022 128,924 437,058

The causes listed include the top three causes of death in each age group. Foreach age group, give the top three causes and the percent of deaths due to each.Use your results to explain briefly how the leading causes of death change aspeople get older.

Page 5: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

Conditional distributions 153

Attack of the killer TVs!

Are kids in greater danger from TVsets or alligators? Alligator attacksmake the news, but they aren’t highon any count of causes of death andinjury. In fact, the 28 children killedby falling TV sets in the UnitedStates between 1990 and 1997 isabout twice the total number ofpeople killed by alligators in Floridasince 1948.

Conditional distributionsTable 6.1 contains much more information than the two marginal distributions ofage alone and sex alone. The nature of the relationship between the age and sexof college students cannot be deduced from the separate distributions but requiresthe full table. Relationships between categorical variables are described by calculatingappropriate percents from the counts given. We use percents because counts are oftenhard to compare. For example, there are 5,668,000 female college students in the18 to 24 years age group, and only 1,660,000 in the 35 years or over group. Becausethere are many more students overall in the 18 to 24 group, these counts don’tallow us to compare how prominent women are in the two age groups. When wecompare the percents of women and men in several age groups, we are comparingconditional distributions.

MARGINAL AND CONDITIONAL DISTRIBUTIONS

The marginal distribution of one of the categorical variables in a two-waytable of counts is the distribution of values of that variable among allindividuals described by the table.A conditional distribution of a variable is the distribution of values of thatvariable among only individuals who have a given value of the othervariable. There is a separate conditional distribution for each value of theother variable.

E X A M P L E 6 . 3 Conditional distribution of sex given age

If we know that a college student is 18 to 24 years old, we need look at only the “18 to24 years” row in the two-way table, highlighted in Table 6.2. To find the distributionof sex among only students in this age group, divide each count in the row by the row

T A B L E 6 . 2 College students by sex and age: the 18 to24 years age group

Sex

Age group Female Male Total

15 to 17 years 89 61 15018 to 24 years 5,668 4,697 10,36525 to 34 years 1,904 1,589 3,49435 years or older 1,660 970 2,630

Total 9,321 7,317 16,639

Page 6: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

154 C H A P T E R 6 • Two-Way Tables

total, which is 10,365. The conditional distribution of sex given that a student is 18 to24 years old is

Female Male

Percent of 18 to 24 age group 54.7 45.3

The two percents add to 100% because all 18- to 24-year-old students are either femaleor male. We use the term “conditional” because these percents describe only studentswho satisfy the condition that they are between 18 and 24 years old.

E X A M P L E 6 . 4 Women among college students

Let’s follow the four-step process (page 53), starting with a practical question of interest4STEPSTEP

to college administrators.

STATE: The proportion of college students who are older than the traditional 18 to24 years is increasing. How does the participation of women in higher education changeas we look at older students?

FORMULATE: Calculate and compare the conditional distributions of sex for collegestudents in several age groups.

SOLVE: Comparing conditional distributions reveals the nature of the association be-tween the sex and age of college students. Look at each row in Table 6.1 (that is, at eachage group) in turn. Find the numbers of women and of men as percents of each row total.Here are the four conditional distributions of sex given age group:

Female Male

Percent of 15 to 17 age group 59.3 40.7Percent of 18 to 24 age group 54.7 45.3Percent of 25 to 34 age group 54.5 45.5Percent of 35 or older age group 63.1 36.9

Because the variable “sex” has just two values, comparing conditional distributions justamounts to comparing the percents of women in the four age groups. The bar graph inFigure 6.2 compares the percents of women in the four age groups. The heights of thebars do not add to 100% because they are not parts of a whole. Each bar describes adifferent age group.

CONCLUDE: Women are a majority of college students in all age groups but are some-what more predominant among students 35 years or older. Women are more likely thanmen to return to college after working for a number of years. That’s an important partof the relationship between the sex and age of college students.

Remember that there are two sets of conditional distributions for any two-way table. Examples 6.3 and 6.4 looked at the conditional distributions of sex fordifferent age groups. We could also examine the conditional distributions of agefor the two sexes.

Page 7: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

Conditional distributions 155

010

2030

4050

6070

Per

cen

t o

f w

omen

in t

he

age

gro

up

15 to 17 18 to 24 25 to 34 35 or older

Age group (years)

F I G U R E 6 . 2 Bar graph comparing the percent of female college students in four agegroups. There are more women than men in all age groups, but the percent of women ishighest among older students.

E X A M P L E 6 . 5 Conditional distribution of age given sex

What is the distribution of age among female college students? Information aboutwomen students appears in the “Female” column. Look only at this column, which ishighlighted in Table 6.3. To find the conditional distribution of age, divide the count of

T A B L E 6 . 3 College students by sex and age: females

Sex

Age group Female Male Total

15 to 17 years 89 61 15018 to 24 years 5,668 4,697 10,36525 to 34 years 1,904 1,589 3,49435 years or older 1,660 970 2,630

Total 9,321 7,317 16,639

Page 8: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

156 C H A P T E R 6 • Two-Way Tables

women in each age group by the column total, which is 9321. Here is the distribution:

Percent of female students aged

15 to 17 18 to 24 25 to 34 35 or older

1.0 60.8 20.4 17.8Looking only at the “Male”column in the two-way table gives the conditional distribu-tion of age for men:

Percent of male students aged

15 to 17 18 to 24 25 to 34 35 or older

0.8 64.2 21.7 13.3

Each set of percents adds to 100% because each conditional distribution includes all stu-dents of one sex. Comparing these two conditional distributions shows the relationshipbetween sex and age in another form. Male students are more likely than women to be18 to 24 years old and less likely to be 35 or older.

Smiling faces

Women smile more than men. Thesame data that produce this factallow us to link smiling to othervariables in two-way tables. Forexample, add as the second variablewhether or not the person thinksthey are being observed. If yes,that’s when women smile more. Ifno, there’s no difference betweenwomen and men. Or take thesecond variable to be the person’soccupation or social role. Withineach social category, there is verylittle difference in smiling betweenwomen and men.

Software will do these calculations for you. Most programs allow you to choosewhich conditional distributions you want to compare. The output in Figure 6.3compares the four conditional distributions of sex given age and also the marginal

Contingency Table with summary

Contingency table results:

Cell format

Rows: Age group

Columns: Sex

Count

Female Male Total

15 to 17 89(59.33%)

61(40.67%)

4697(45.32%)

1589(45.49%)

970(36.88%)

7317(43.98%)

9321(56.02%)

1660(63.12%)

1904(54.51%)

5668(54.68%)

150(100.00%)

10365(100.00%)

3493(100.00%)

2630(100.00%)

16638(100.00%)

18 to 24

25 to 34

35 or older

Total

(Row percent)

F I G U R E 6 . 3 CrunchIt! output of the two-way table of college students by age andsex, along with each entry as a percent of its column total. The percents in the threecolumns give the conditional distributions of age for the two sexes and (in the thirdcolumn) the marginal distribution of age for all college students.

Page 9: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

Conditional distributions 157

distribution of sex for all students. The row percents in the first two columns agree(up to roundoff) with the results in Example 6.4.

No single graph (such as a scatterplot) portrays the form of the relationship between

CAUTIONUTION

categorical variables. No single numerical measure (such as the correlation) summarizesthe strength of the association. Bar graphs are flexible enough to be helpful, but youmust think about what comparisons you want to display. For numerical measures,we rely on well-chosen percents. You must decide which percents you need. Hereis a hint: if there is an explanatory-response relationship, compare the conditional dis-tributions of the response variable for the separate values of the explanatory variable. Ifyou think that age influences the proportions of men and women among collegestudents, compare the conditional distributions of sex among students of differentages, as in Example 6.4.

A P P L Y Y O U R K N O W L E D G E

6.3 Female college students. Starting with Table 6.1, show the calculations to findthe conditional distribution of age among female college students. Your resultsshould agree with those in Example 6.5.

6.4 Majors for men and women in business. A study of the career plans of youngwomen and men sent questionnaires to all 722 members of the senior class in theCollege of Business Administration at the University of Illinois. One questionasked which major within the business program the student had chosen. Here arethe data from the students who responded:4

Female Male

Accounting 68 56Administration 91 40Economics 5 6Finance 61 59

(a) Find the two conditional distributions of major, one for women and one formen. Based on your calculations, describe the differences between women andmen with a graph and in words.

(b) What percent of the students did not respond to the questionnaire? Thenonresponse weakens conclusions drawn from these data.

6.5 Risks of playing soccer. The two-way table in Exercise 6.1 describes a study of 4STEPSTEP

arthritis of the hip or knee among people with different levels of experienceplaying soccer. We suspect that the more serious soccer players have morearthritis later in life. Do the data confirm this suspicion? Follow the four-stepprocess, as illustrated in Example 6.4.

6.6 Marginal distributions aren’t the whole story. Here are the row and columntotals for a two-way table with two rows and two columns:

a b 50c d 50

60 40 100

Page 10: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

158 C H A P T E R 6 • Two-Way Tables

Find two different sets of counts a , b , c, and d for the body of the table that givethese same totals. This shows that the relationship between two variables cannotbe obtained from the two individual distributions of the variables.

Simpson’s paradoxAs is the case with quantitative variables, the effects of lurking variables canchange or even reverse relationships between two categorical variables. Here isan example that demonstrates the surprises that can await the unsuspecting userof data.

Ashley/Cooper/PICIMPACT/CORBIS

E X A M P L E 6 . 6 Do medical helicopters save lives?

Accident victims are sometimes taken by helicopter from the accident scene to a hospi-tal. Helicopters save time. Do they also save lives? Let’s compare the percents of accidentvictims who die with helicopter evacuation and with the usual transport to a hospitalby road. Here are hypothetical data that illustrate a practical difficulty:5

Helicopter Road

Victim died 64 260Victim survived 136 840

Total 200 1100

We see that 32% (64 out of 200) of helicopter patients died, but only 24% (260 out of1100) of the others did. That seems discouraging.

The explanation is that the helicopter is sent mostly to serious accidents, so thatthe victims transported by helicopter are more often seriously injured. They are morelikely to die with or without helicopter evacuation. Here are the same data broken downby the seriousness of the accident:

Serious Accidents Less Serious Accidents

Helicopter Road Helicopter Road

Died 48 60 Died 16 200Survived 52 40 Survived 84 800

Total 100 100 Total 100 1000

Inspect these tables to convince yourself that they describe the same 1300 accidentvictims as the original two-way table. For example, 200 (100 + 100) were moved byhelicopter, and 64 (48 + 16) of these died.

Among victims of serious accidents, the helicopter saves 52% (52 out of 100) com-pared with 40% for road transport. If we look only at less serious accidents, 84% of thosetransported by helicopter survive, versus 80% of those transported by road. Both groupsof victims have a higher survival rate when evacuated by helicopter.

Page 11: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

Simpson’s paradox 159

At first, it seems paradoxical that the helicopter does better for both groupsof victims but worse when all victims are lumped together. Examining the datamakes the explanation clear. Half the helicopter transport patients are from se-rious accidents, compared with only 100 of the 1100 road transport patients. Sothe helicopter carries patients who are more likely to die. The seriousness of theaccident was a lurking variable that, until we uncovered it, made the relationshipbetween survival and mode of transport to a hospital hard to interpret. Example 6.6illustrates Simpson’s paradox.

SIMPSON’S PARADOX

An association or comparison that holds for all of several groups can reversedirection when the data are combined to form a single group. This reversalis called Simpson’s paradox.

The lurking variable in Simpson’s paradox is categorical. That is, it breaksthe individuals into groups, as when accident victims are classified as injured ina “serious accident” or a “less serious accident.” Simpson’s paradox is just an ex-treme form of the fact that observed associations can be misleading when thereare lurking variables.

A P P L Y Y O U R K N O W L E D G E

6.7 Airline flight delays. Here are the numbers of flights on time and delayed fortwo airlines at five airports in one month. Overall on-time percents for eachairline are often reported in the news. The airport that flights serve is a lurkingvariable that can make such reports misleading.6

Alaska Airlines America West

On time Delayed On time Delayed

Los Angeles 497 62 694 117Phoenix 221 12 4840 415San Diego 212 20 383 65San Francisco 503 102 320 129Seattle 1841 305 201 61

(a) What percent of all Alaska Airlines flights were delayed? What percent of allAmerica West flights were delayed? These are the numbers usually reported.

(b) Now find the percent of delayed flights for Alaska Airlines at each of the fiveairports. Do the same for America West.

(c) America West did worse at every one of the five airports, yet did better overall.That sounds impossible. Explain carefully, referring to the data, how this canhappen. (The weather in Phoenix and Seattle lies behind this example ofSimpson’s paradox.)

Page 12: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

160 C H A P T E R 6 • Two-Way Tables

6.8 Race and the death penalty. Whether a convicted murderer gets the deathpenalty seems to be influenced by the race of the victim. Here are data on326 cases in which the defendant was convicted of murder:7

White Defendant Black Defendant

White Black White Blackvictim victim victim victim

Death 19 0 Death 11 6Not 132 9 Not 52 97

(a) Use these data to make a two-way table of defendant’s race (white or black)versus death penalty (yes or no).

(b) Show that Simpson’s paradox holds: a higher percent of white defendants aresentenced to death overall, but for both black and white victims a higher percentof black defendants are sentenced to death.

(c) Use the data to explain why the paradox holds in language that a judge couldunderstand.

C H A P T E R 6 SUMMARYA two-way table of counts organizes data about two categorical variables. Valuesof the row variable label the rows that run across the table, and values of thecolumn variable label the columns that run down the table. Two-way tables areoften used to summarize large amounts of information by grouping outcomes intocategories.The row totals and column totals in a two-way table give the marginaldistributions of the two individual variables. It is clearer to present thesedistributions as percents of the table total. Marginal distributions tell us nothingabout the relationship between the variables.There are two sets of conditional distributions for a two-way table: thedistributions of the row variable for each fixed value of the column variable, andthe distributions of the column variable for each fixed value of the row variable.Comparing one set of conditional distributions is one way to describe theassociation between the row and the column variables.To find the conditional distribution of the row variable for one specific value ofthe column variable, look only at that one column in the table. Find each entryin the column as a percent of the column total.Bar graphs are a flexible means of presenting categorical data. There is no singlebest way to describe an association between two categorical variables.A comparison between two variables that holds for each individual value of athird variable can be changed or even reversed when the data for all values of thethird variable are combined. This is Simpson’s paradox. Simpson’s paradox is anexample of the effect of lurking variables on an observed association.

Page 13: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

Check Your Skills 161

C H E C K Y O U R S K I L L S

The National Survey of Adolescent Health interviewed several thousand teens (grades 7to 12). One question asked was “What do you think are the chances you will be marriedin the next ten years?”Here is a two-way table of the responses by sex:8

Female Male

Almost no chance 119 103Some chance, but probably not 150 171A 50-50 chance 447 512A good chance 735 710Almost certain 1174 756

Exercises 6.9 to 6.17 are based on this table.

6.9 How many females were among the respondents?

(a) 2625 (b) 4877 (c) need more information

6.10 How many individuals are described by this table?

(a) 2625 (b) 4877 (c) need more information

6.11 The percent of females among the respondents was

(a) about 46%. (b) about 54%. (c) about 86%.

6.12 Your percent from the previous exercise is part of

(a) the marginal distribution of sex.

(b) the marginal distribution of chance of marriage.

(c) the conditional distribution of sex given chance of marriage.

6.13 What percent of females thought that they were almost certain to be married inthe next ten years?

(a) about 40% (b) about 45% (c) about 61%

6.14 Your percent from the previous exercise is part of

(a) the marginal distribution of chance of marriage.

(b) the conditional distribution of sex given chance of marriage.

(c) the conditional distribution of chance of marriage given sex.

6.15 What percent of those who thought they were almost certain to be married werefemale?

(a) about 40% (b) about 45% (c) about 61%

6.16 Your percent from the previous exercise is part of

(a) the marginal distribution of chance of marriage.

(b) the conditional distribution of sex given chance of marriage.

(c) the conditional distribution of chance of marriage given sex.

6.17 A bar graph showing the conditional distribution of chance of marriage giventhat the respondent was female would have

(a) 2 bars. (b) 5 bars. (c) 10 bars.

6.18 A college looks at the grade point average (GPA) of its full-time and part-timestudents. Grades in science courses are generally lower than grades in other

Page 14: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

162 C H A P T E R 6 • Two-Way Tables

courses. There are few science majors among part-time students but many sciencemajors among full-time students. The college finds that full-time students who arescience majors have higher GPA than part-time students who are science majors.Full-time students who are not science majors also have higher GPA thanpart-time students who are not science majors. Yet part-time students as a grouphave higher GPA than full-time students. This finding is

(a) not possible: if both science and other majors who are full-time have higherGPA than those who are part-time, then all full-time students together must havehigher GPA than all part-time students together.

(b) an example of Simpson’s paradox: full-time students do better in both kindsof courses but worse overall because they take more science courses.

(c) due to comparing two conditional distributions that should not be compared.

C H A P T E R 6 EXERCISES

Marital status and job level. We sometimes hear that getting married is good foryour career. Table 6.4 presents data from one of the studies behind this generalization.To avoid gender effects, the investigators looked only at men. The data describe themarital status and the job level of all 8235 male managers and professionals employed bya large manufacturing firm.9 The firm assigns each position a grade that reflects thevalue of that particular job to the company. The authors of the study grouped the manyjob grades into quarters. Grade 1 contains jobs in the lowest quarter of the job grades,and Grade 4 contains those in the highest quarter. Exercises 6.19 to 6.23 are based onthese data.

6.19 Marginal distributions. Give (in percents) the two marginal distributions, formarital status and for job grade. Do each of your two sets of percents add toexactly 100%? If not, why not?

6.20 Percents. What percent of single men hold Grade 1 jobs? What percent ofGrade 1 jobs are held by single men?

6.21 Conditional distribution. Give (in percents) the conditional distribution of jobgrade among single men. Should your percents add to 100% (up to roundofferror)?

6.22 Marital status and job grade. One way to see the relationship is to look at whoholds Grade 1 jobs.

T A B L E 6 . 4 Marital status and job level

Marital Status

Job grade Single Married Divorced Widowed Total

1 58 874 15 8 9552 222 3927 70 20 42393 50 2396 34 10 24904 7 533 7 4 551

Total 337 7730 126 42 8235

Page 15: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

Chapter 6 Exercises 163

(a) There are 874 married men with Grade 1 jobs, and only 58 single men withsuch jobs. Explain why these counts by themselves don’t describe the relationshipbetween marital status and job grade.

(b) Find the percent of men in each marital status group who have Grade 1 jobs.Then find the percent in each marital group who have Grade 4 jobs. What dothese percents say about the relationship?

6.23 Association is not causation. The data in Table 6.4 show that single men aremore likely to hold lower-grade jobs than are married men. We should notconclude that single men can help their career by getting married. What lurkingvariables might help explain the association between marital status and job grade?

6.24 Attitudes toward recycled products. Recycling is supposed to save resources.Some people think recycled products are lower in quality than other products, afact that makes recycling less practical. People who actually use a recycledproduct may have different opinions from those who don’t use it. Here are data onattitudes toward coffee filters made of recycled paper among people who do anddon’t buy these filters:10

Think the quality ofthe recycled product is

Higher The same Lower

Buyers 20 7 9Nonbuyers 29 25 43

(a) Find the marginal distribution of opinion about quality. Assuming that thesepeople represent all users of coffee filters, what does this distribution tell us?

(b) How do the opinions of buyers and nonbuyers differ? Use conditionaldistributions as a basis for your answer. Can you conclude that using recycledfilters causes more favorable opinions? If so, giving away samples might increasesales.

6.25 Helping cocaine addicts. Cocaine addiction is hard to break. Addicts needcocaine to feel any pleasure, so perhaps giving them an antidepressant drug willhelp. An experiment assigned 72 chronic cocaine users to take either anantidepressant drug called desipramine, lithium, or a placebo. (Lithium is astandard drug to treat cocaine addiction. A placebo is a dummy drug, used so thatthe effect of being in the study but not taking any drug can be seen.) One-third ofthe subjects, chosen at random, received each drug. Here are the results afterthree years:11

Desipramine Lithium Placebo

Relapse 10 18 20No relapse 14 6 4

Total 24 24 24

(a) Compare the effectiveness of the three treatments in preventing relapse. Usepercents and draw a bar graph.

(b) Do you think that this study gives good evidence that desipramine actuallycauses a reduction in relapses?

Page 16: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

164 C H A P T E R 6 • Two-Way Tables

6.26 Violent deaths. How does the impact of “violent deaths” due to accidents,4STEPSTEP

homicide, and suicide change with age group? Use the data in Exercise 6.2(page 152) and follow the four-step process (page 53) in your answer.

6.27 College degrees. Here are data on the numbers of degrees earned in 2005–2006,4STEPSTEP

as projected by the National Center for Education Statistics. The table entries arecounts of degrees in thousands.12

Female Male

Associate’s 431 244Bachelor’s 813 584Master’s 298 215Professional 42 47Doctor’s 21 24

Describe briefly how the participation of women changes with level of degree.Follow the four-step process, as illustrated in Example 6.4.

Henryk Kaiser/eStockPhotography/PictureQuest

6.28 Do angry people have more heart disease? People who get angry easily tend to4STEPSTEP

have more heart disease. That’s the conclusion of a study that followed a randomsample of 12,986 people from three locations for about four years. All subjectswere free of heart disease at the beginning of the study. The subjects took theSpielberger Trait Anger Scale test, which measures how prone a person is tosudden anger. Here are data for the 8474 people in the sample who had normalblood pressure.13 CHD stands for “coronary heart disease.” This includes peoplewho had heart attacks and those who needed medical treatment for heart disease.

Low anger Moderate anger High anger Total

CHD 53 110 27 190No CHD 3057 4621 606 8284

Total 3110 4731 633 8474

Do these data support the study’s conclusion about the relationship betweenanger and heart disease? Follow the four-step process (page 53) in your answer.

6.29 Python eggs. How is the hatching of water python eggs influenced by the4STEPSTEP

temperature of the snake’s nest? Researchers assigned newly laid eggs to one ofthree temperatures: hot, neutral, or cold. Hot duplicates the warmth provided bythe mother python. Neutral and cold are cooler, as when the mother is absent.Here are the data on the number of eggs and the number that hatched:14

Cold Neutral Hot

Number of eggs 27 56 104Number hatched 16 38 75

Page 17: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

Chapter 6 Exercises 165

Notice that this is not a two-way table! The researchers anticipated that eggswould hatch less well at cooler temperatures. Do the data support thatanticipation? Follow the four-step process (page 53) in your answer.

6.30 Which hospital is safer? To help consumers make informed decisions abouthealth care, the government releases data about patient outcomes in hospitals.You want to compare Hospital A and Hospital B, which serve your community.Here are data on all patients undergoing surgery in a recent time period. The datainclude the condition of the patient (“good” or “poor”) before the surgery.“Survived” means that the patient lived at least 6 weeks following surgery.

Good Condition Poor Condition

Hospital A Hospital B Hospital A Hospital B

Died 6 8 Died 57 8Survived 594 592 Survived 1443 192

Total 600 600 Total 1500 200

(a) Compare percents to show that Hospital A has a higher survival rate for bothgroups of patients.

(b) Combine the data into a single two-way table of outcome (“survived” or“died”) by hospital (A or B). The local paper reports just these overall survivalrates. Which hospital has the higher rate?

(c) Explain from the data, in language that a reporter can understand, howHospital B can do better overall even though Hospital A does better for bothgroups of patients.

6.31 Discrimination? Wabash Tech has two professional schools, business and law.Here are two-way tables of applicants to both schools, categorized by gender andadmission decision. (Although these data are made up, similar situations occur inreality.)15

Business Law

Admit Deny Admit Deny

Male 480 120 Male 10 90Female 180 20 Female 100 200

(a) Make a two-way table of gender by admission decision for the twoprofessional schools together by summing entries in these tables.

(b) From the two-way table, calculate the percent of male applicants who areadmitted and the percent of female applicants who are admitted. Wabash admitsa higher percent of male applicants.

(c) Now compute separately the percents of male and female applicants admittedby the business school and by the law school. Each school admits a higher percentof female applicants.

Page 18: CHAPTER 6virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch6-149-166.pdf · P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48 CHAPTER 6 In

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-06 GTBL011-Moore-v15.cls May 16, 2006 17:48

166 C H A P T E R 6 • Two-Way Tables

(d) This is Simpson’s paradox: both schools admit a higher percent of the womenwho apply, but overall Wabash admits a lower percent of female applicants thanof male applicants. Explain carefully, as if speaking to a skeptical reporter, how itcan happen that Wabash appears to favor males when each school individuallyfavors females.

6.32 Obesity and health. Recent studies have shown that earlier reportsunderestimated the health risks associated with being overweight. The error wasdue to overlooking lurking variables. In particular, smoking tends both to reduceweight and to lead to earlier death. Illustrate Simpson’s paradox by a simplifiedversion of this situation. That is, make up two-way tables of overweight (yes orno) by early death (yes or no) separately for smokers and nonsmokers such that

• Overweight smokers and overweight nonsmokers both tend to die earlierthan those not overweight.

• But when smokers and nonsmokers are combined into a two-way table ofoverweight by early death, persons who are not overweight tend to die earlier.