ch12 statistics

68
679 C H A P T E R 12 Statistics I n a 1996 study, researchers found that employees described as plain had salaries below the median level and those described as attractive had salaries above the median level. Source: Jeff Biddle and Daniel Hamermesh, CompPsych Survey Statisticians collect numerical data from subgroups of populations to find out everything imaginable about the population as a whole, including whom they favor in an elec- tion, what they watch on TV, how much money they make, what worries them, and even how being attractive pays off. Comedians and statisticians joke that 62.38% of all statistics are made up on the spot. Because statisticians both record and influence our behavior, it is important to distinguish between good and bad methods for collecting, presenting, and interpreting data. In this chapter, you will gain an understanding of where data come from and how these numbers are used to make decisions. $30 $25 $20 $15 $10 $5 Annual Salary (thousands) Men $22,880 $26,880 Women $23,480 $26,580 Annual Salaries of American Men and Women, by Physical Attractiveness Plain Attractive Median Income: $25,480

Upload: oscar-solis-martir

Post on 18-Apr-2015

567 views

Category:

Documents


11 download

TRANSCRIPT

Page 1: Ch12 Statistics

679

C H A P T E R 12Statistics

In a 1996 study, researchers found that employees described as plain had salaries below themedian level and those described as attractive had salaries above the median level.

Source: Jeff Biddle and Daniel Hamermesh, CompPsych Survey

Statisticians collect numerical data from subgroups ofpopulations to find out everything imaginable about thepopulation as a whole, including whom they favor in an elec-

tion, what they watch on TV, how much money they make,what worries them, and even how being attractive pays off.

Comedians and statisticians joke that 62.38% of all statisticsare made up on the spot. Because statisticians both recordand influence our behavior, it is important to distinguish

between good and bad methods for collecting, presenting, andinterpreting data. In this chapter, you will gain an understanding

of where data come from and how these numbers are used tomake decisions.

$30

$25

$20

$15

$10

$5Ann

ual S

alar

y (t

hous

ands

)

Men

$22,880

$26,880

Women

$23,480

$26,580

Annual Salaries of American Men andWomen, by Physical Attractiveness

PlainAttractive Median Income: $25,480

BLITMC12_679-746-hr 11/2/06 1:00 PM Page 679

Page 2: Ch12 Statistics

1 Describe the population whoseproperties are to be analyzed.

680 CHAPTER 12 Statistics

SECTION 12.1 • SAMPLING, FREQUENCY DISTRIBUTIONS, ANDGRAPHS

O B J E C T I V E S

1. Describe the population whoseproperties are to be analyzed.

2. Select an appropriatesampling technique.

3. Organize and present data.

4. Identify deceptions in visualdisplays of data.

At the end of the twentieth century, there were 94 million households in the UnitedStates with television sets.The television program viewed by the greatest percentageof such households in that century was the final episode of Over 50 mil-lion American households watched this program.

Numerical information, such as the information about the top three TV showsof the twentieth century, shown in Table 12.1, is called data. The word statisticsis often used when referring to data. However, statistics has a second meaning:Statistics is also a method for collecting, organizing, analyzing, and interpreting data,as well as drawing conclusions based on the data.This methodology divides statisticsinto two main areas. Descriptive statistics is concerned with collecting, organizing,summarizing, and presenting data. Inferential statistics has to do with makinggeneralizations about and drawing conclusions from the data collected.

M*A*S*H.

T A B L E 1 2 . 1 T V P R O G R A M S W I T H T H E G R E AT E S T U. S . A U D I E N C E V I E W I N G P E R C E N TAG E O F T H E

T W E N T I E T H C E N T U RY

Program Total Households Viewing Percentage

1. Feb. 28, 1983 50,150,000 60.2%

2. Dallas Nov. 21, 1980 41,470,000 53.3%

3. Roots Part 8 Jan. 30, 1977 36,380,000 51.1%

M*A*S*H

Source: Nielsen Media Research

took place in the early 1950s, during theKorean War. By the final episode, the show had lastedfour times as long as the Korean War.

M*A*S*H

Populations and SamplesConsider the set of all American TV households. Such a set is called the population.In general, a population is the set containing all the people or objects whoseproperties are to be described and analyzed by the data collector.

The population of American TV households is huge. At the time of theconclusion, there were nearly 84 million such households. Did over

50 million American TV households really watch the final episode of Afriendly phone call to each household (“So, how are you? What’s new? Watch anygood television last night? If so, what?”) is, of course, absurd. A sample, which is asubset or subgroup of the population, is needed. In this case, it would be appropriateto have a sample of a few thousand TV households to draw conclusions about thepopulation of all TV households.

M*A*S*H?M*A*S*H

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 680

Page 3: Ch12 Statistics

SECTION 12.1 Sampling, Frequency Distributions, and Graphs 681

E X A M P L E 1 P O P U L AT I O N S A N D S A M P L E S

A group of hotel owners in a large city decide to conduct a survey among citizens ofthe city to discover their opinions about casino gambling.

a. Describe the population.b. One of the hotel owners suggests obtaining a sample by surveying all the peo-

ple at six of the largest nightclubs in the city on a Saturday night. Each personwill be asked to express his or her opinion on casino gambling. Does this seemlike a good idea?

S O L U T I O N

a. The population is the set containing all the citizens of the city.b. Questioning people at six of the city’s largest nightclubs is a terrible idea. The

nightclub subset is probably more likely to have a positive attitude towardcasino gambling than the population of all the city’s citizens.

A city government wants to conduct a survey among the city’s homeless todiscover their opinions about required residence in city shelters from mid-

night until 6 A.M.

a. Describe the population.b. A city commissioner suggests obtaining a sample by surveying all the homeless

people at the city’s largest shelter on a Sunday night. Does this seem like a goodidea? Explain your answer.

Random SamplingThere is a way to use a small sample to make generalizations about a largepopulation: Guarantee that every member of the population has an equal chance tobe selected for the sample. Surveying people at six of the city’s largest nightclubsdoes not provide this guarantee. Unless it can be established that all citizens of thecity frequent these clubs, which seems unlikely, this sampling scheme does notpermit each citizen an equal chance of selection.

R A N D O M S A M P L E SA random sample is a sample obtained in such a way that every element in thepopulation has an equal chance of being selected for the sample.

Suppose that you are elated with the quality of one of your courses. Althoughit’s an auditorium section with 120 students, you feel that the professor is lecturingright to you. During a wonderful lecture, you look around the auditorium to see ifany of the other students are sharing your enthusiasm. Based on body language, it’shard to tell. You really want to know the opinion of the population of 120 studentstaking this course. You think about asking students to grade the course on an A to Fscale, anticipating a unanimous A. You cannot survey everyone. Eureka! Suddenlyyou have an idea on how to take a sample. Place cards numbered from 1 through120, one number per card, in a box. Because the course has assigned seating bynumber, each numbered card corresponds to a student in the class. Reach in andrandomly select six cards. Each card, and therefore each student, has an equalchance of being selected. Then use the opinions about the course from the sixrandomly selected students to generalize about the course opinion for the entire120-student population.

Your idea is precisely how random samples are obtained. In random sampling,each element in the population must be identified and assigned a number.The num-bers are generally assigned in order. The way to sample from the larger numberedpopulation is to generate random numbers using a computer or calculator.

CH

EC

K POINT

1

BLIT

ZER

B NUS

A Sampling Fiasco

Cover of the Literary Digest, October 31,1936. General Research Division, TheNew York Public Library, Astor, Lenoxand Tilden Foundations. The New YorkPublic Library/Art Resource, NY

In 1936, the Literary Digestmailed out over ten million bal-lots to voters throughout thecountry. The results poured in,and the magazine predicted alandslide victory for RepublicanAlf Landon over DemocratFranklin Roosevelt. However,the prediction of the LiteraryDigest was wrong. Why? Themailing lists the editors usedincluded people from their ownsubscriber list, directories ofautomobile owners, and tele-phone books. As a result, itssample was anything butrandom. It excluded most of thepoor, who were unlikely tosubscribe to the Literary Digest,or to own a car or telephone inthe heart of the Depression.Prosperous people in 1936 weremore likely to be Republicanthan the poor. Thus, althoughthe sample was massive, itincluded a higher percentage ofaffluent individuals than thepopulation as a whole did. Avictim of both the Depressionand the 1936 sampling fiasco, theLiterary Digest folded in 1937.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 681

Page 4: Ch12 Statistics

3 Organize and present data.

BLIT

ZER

B NUS

The United StatesCensusA census is a survey that attemptsto include the entire population.The U.S. Constitution requires acensus of the American popula-tion every ten years. The 2000census form was mailed to allhouseholds in the country. Acensus “long form” that asksmany more questions than thebasic census form is sent to arandom sample of one-sixth of allhouseholds.

Although the census gener-ates volumes of statistics, itsmain purpose is to give thegovernment block-by-block pop-ulation figures to create electiondistricts with equal populationsneeds. The U.S. census is notfoolproof. The 1990 censusmissed 1.6% of the Americanpopulation, including an esti-mated 4.4% of the AfricanAmerican population, largely ininner cities.

2 Select an appropriate samplingtechnique.

682 CHAPTER 12 Statistics

Each numbered element from the population that corresponds to one of thegenerated random numbers is selected for the sample.

Call-in polls on radio and television are not reliable because those polled do notrepresent the larger population. A person who calls in is likely to have feelingsabout an issue that are consistent with the politics of the show’s host. For a poll to beaccurate, the sample must be chosen randomly from the larger population.The A. C.Nielsen Company uses a random sample of approximately 5000 TV households tomeasure the percentage of households tuned in to a television program.

E X A M P L E 2S E L E C T I N G A N A P P R O P R I AT E S A M P L I N GT E C H N I Q U E

We return to the hotel owners in the large city who are interested in how the city’scitizens feel about casino gambling. Which of the following would be the mostappropriate way to select a random sample?

a. Randomly survey people who live in the oceanfront condominiums in the city.

b. Survey the first 200 people whose names appear in the city’s telephone directory.

c. Randomly select neighborhoods of the city and then randomly survey peoplewithin the selected neighborhoods.

S O L U T I O N Keep in mind that the population is the set containing all the city’scitizens. A random sample must give each citizen an equal chance of being selected.

a. Randomly selecting people who live in the city’s oceanfront condominiums isnot a good idea. Many hotels lie along the oceanfront, and the oceanfrontproperty owners might object to the traffic and noise as a result of casinogambling. Furthermore, this sample does not give each citizen of the city anequal chance of being selected.

b. If the hotel owners survey the first 200 names in the city’s telephone directory,all citizens do not have an equal chance of selection. For example, individualswhose last name begins with a letter toward the end of the alphabet have nochance of being selected.

c. Randomly selecting neighborhoods of the city and then randomly surveyingpeople within the selected neighborhoods is an appropriate technique. Usingthis method, each citizen has an equal chance of being selected.

In summary, given the three options, the sampling technique in part (c) is the mostappropriate.

Surveys and polls involve data from a sample of some population. Regardless ofthe sampling technique used, the sample should exhibit characteristics typical ofthose possessed by the target population. This type of sample is called a representa-tive sample.

Explain why the sampling technique described in Check Point 1(b) on page681 is not a random sample. Then describe an appropriate way to select a

random sample of the city’s homeless.

Frequency DistributionsAfter data have been collected from a sample of the population, the next task facingthe statistician is to present the data in a condensed and manageable form. In thisway, the data can be more easily interpreted.

Suppose, for example, that researchers are interested in determining the age atwhich adolescent males show the greatest rate of physical growth.A random sampleof 35 ten-year-old boys is measured for height and then remeasured each year untilthey reach 18. The age of maximum yearly growth for each subject is as follows:

12, 14, 13, 14, 16, 14, 14, 17, 13, 10, 13, 18, 12, 15, 14, 15, 15, 14, 14, 13, 15, 16, 15, 12, 13,16, 11, 15, 12, 13, 12, 11, 13, 14, 14.

CH

EC

K POINT

2

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 682

Page 5: Ch12 Statistics

SECTION 12.1 Sampling, Frequency Distributions, and Graphs 683

A piece of data is called a data item. This list of data has 35 data items. Some ofthe data items are identical. Two of the data items are 11 and 11. Thus, we can saythat the data value 11 occurs twice. Similarly, because five of the data items are 12,12, 12, 12, and 12, the data value 12 occurs five times.

Collected data can be presented using a frequency distribution. Such a distribu-tion consists of two columns. The data values are listed in one column. Numericaldata are generally listed from smallest to largest. The adjacent column is labeledfrequency and indicates the number of times each value occurs.

E X A M P L E 3CO N S T R U C T I N G A F R E Q U E N C YD I S T R I B U T I O N

Construct a frequency distribution for the data of the age of maximum yearlygrowth for 35 boys:

12, 14, 13, 14, 16, 14, 14, 17, 13, 10, 13, 18, 12, 15, 14, 15, 15, 14, 14, 13, 15, 16, 15, 12, 13,16, 11, 15, 12, 13, 12, 11, 13, 14, 14.

S O L U T I O N It is difficult to determine trends in the data above in its currentformat. Perhaps we can make sense of the data by organizing it into a frequencydistribution. Let us create two columns. One lists all possible data values, fromsmallest (10) to largest (18). The other column indicates the number of times thevalue occurs in the sample. The frequency distribution is shown in Table 12.2.

The frequency distribution indicates that one subject had maximum growth atage 10, two at age 11, five at age 12, seven at age 13, and so on.The maximum growthfor most of the subjects occurred between the ages of 12 and 15. Nine boys experi-enced maximum growth at age 14, more than at any other age within the sample.The sum of the frequencies, 35, is equal to the original number of data items.

The trend shown by the frequency distribution in Table 12.2 indicates that thenumber of boys who attain their maximum yearly growth at a given age increasesuntil age 14 and decreases after that.This trend is not evident in the data in its originalformat.

Construct a frequency distribution for the data showing final course gradesfor students in a precalculus course, listed alphabetically by student name in

a grade book:

F, A, B, B, C, C, B, C, A, A, C, C, D, C, B, D, C, C, B, C.

A frequency distribution that lists all possible data items can be quite cumber-some when there are many such items. For example, consider the following dataitems. These are statistics test scores for a class of 40 students.

82 47 75 64 57 82 63 93

76 68 84 54 88 77 79 80

94 92 94 80 94 66 81 67

75 73 66 87 76 45 43 56

57 74 50 78 71 84 59 76

It’s difficult to determine how well the group did when the grades are displayedlike this. Because there are so many data items, one way to organize these data sothat the results are more meaningful is to arrange the grades into groups, or classes,based on something that interests us. Many grading systems assign an A to grades inthe 90–100 class, B to grades in the 80–89 class, C to grades in the 70–79 class, and soon. These classes provide one way to organize the data.

Looking at the 40 statistics test scores, we see that they range from a low of 43 toa high of 94. We can use classes that run from 40 through 49, 50 through 59, 60through 69, and so on up to 90 through 99, to organize the scores. In Example 4,

CH

EC

K POINT

3

T A B L E 1 2 . 2 A F R E Q U E N C Y

D I S T R I B U T I O N F O R A B OY ’ S AG E

O F M A X I M U M Y E A R LY G R O W T H

Age of Number Maximum of Boys

Growth (Frequency)

10 1

11 2

12 5

13 7

14 9

15 6

16 3

17 1

18 1

Total:

35 is the sum ofthe frequencies.

n = 35

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 683

Page 6: Ch12 Statistics

684 CHAPTER 12 Statistics

we go through the data and tally each item into the appropriate class. This methodfor organizing data is called a grouped frequency distribution.

E X A M P L E 4CO N S T R U C T I N G A G R O U P E D F R E Q U E N C YD I S T R I B U T I O N

Use the classes 40–49, 50–59, 60–69, 70–79, 80–89, and 90–99 to construct a groupedfrequency distribution for the 40 test scores on the previous page.

S O L U T I O N We use the 40 given scores and tally the number of scores in each class.

T A B L E 1 2 . 3

Class Frequency

40–49 3

50–59 6

60–69 6

70–79 11

80–89 9

90–99 5

Total:

40, the sum of thefrequencies, is the

number of data items.

n = 40

Test Scores Number of Students (Class) Tally (Frequency)

40–49 3

50–59 6

60–69 6

70–79 11

80–89 9

90–99 5 ƒ ƒ ƒ ƒ

ƒ ƒ ƒ ƒ ƒ ƒ ƒ ƒ

ƒ ƒ ƒ ƒ ƒ ƒ ƒ ƒ ƒ

ƒ ƒ ƒ ƒ ƒ

ƒ ƒ ƒ ƒ ƒ

ƒ ƒ ƒ

Omitting the tally column results in the grouped frequency distribution in Table12.3. The distribution shows that the greatest frequency of students scored in the70–79 class. The number of students decreases in classes that contain successivelylower and higher scores. The sum of the frequencies, 40, is equal to the originalnumber of data items.

The leftmost number in each class of a grouped frequency distribution is calledthe lower class limit. For example, in Table 12.3, the lower limit of the first class is 40and the lower limit of the third class is 60. The rightmost number in each class iscalled the upper class limit. In Table 12.3, 49 and 69 are the upper limits for the firstand third classes, respectively. Notice that if we take the difference between any twoconsecutive lower class limits, we get the same number:

The number 10 is called the class width.When setting up class limits, each class, with the possible exception of the first or

last, should have the same width. Because each data item must fall into exactly oneclass, it is sometimes helpful to vary the width of the first or last class to allow foritems that fall far above or below most of the data.

Use the classes in Table 12.3 to construct a grouped frequency distributionfor the following 37 exam scores:

73 58 68 75 94 79 96 79

87 83 89 52 99 97 89 58

95 77 75 81 75 73 73 62

69 76 77 71 50 57 41 98

77 71 69 90 75.

CH

EC

K POINT

4

50 - 40 = 10, 60 - 50 = 10, 70 - 60 = 10, 80 - 70 = 10, 90 - 80 = 10.

The second score in the list,47, is shown as the first

tally in this row.

The first score in the list,82, is shown as the first

tally in this row.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 684

Page 7: Ch12 Statistics

T A B L E 1 2 . 2 ( re p e a t e d ) A F R E Q U E N C Y

D I S T R I B U T I O N F O R A B OY ’ S AG E O F

M A X I M U M Y E A R LY G R O W T H

Age of Number Maximum of Boys

Growth (Frequency)

10 1

11 2

12 5

13 7

14 9

15 6

16 3

17 1

18 1

Total: n = 35

SECTION 12.1 Sampling, Frequency Distributions, and Graphs 685

Ages (years) (Data values)

1810 161412 17151311

Num

ber

of B

oys

wit

h M

axim

umY

earl

y G

row

th (

freq

uenc

y)

5

4

3

2

1

9

8

7

6

Boys' Ages of Maximum Yearly Growth

FIGURE 12.1 A histogram for a boy’s age of maximumyearly growth

Ages (years)

1810 161412 17151311

5

4

3

2

1

9

8

7

6

Num

ber

of B

oys

wit

h M

axim

umY

earl

y G

row

th (

freq

uenc

y)

Boys' Ages of Maximum Yearly Growth

FIGURE 12.2 A histogram with a superimposed frequencypolygon

Histograms and Frequency PolygonsTake a second look at the frequency distribution for the age of a boy’s maximumyearly growth, repeated in Table 12.2. A bar graph with bars that touch can be usedto visually display the data. Such a graph is called a histogram. Figure 12.1 illustratesa histogram that was constructed using the frequency distribution in Table 12.2. Aseries of rectangles whose heights represent the frequencies are placed next to eachother. For example, the height of the bar for the data value 10, shown in Figure 12.1,is 1.This corresponds to the frequency for 10 given in Table 12.2.The higher the bar,the more frequent the age. The break along the horizontal axis, symbolized by ,eliminates listing the ages 1 through 9.

A line graph called a frequency polygon can also be used to visually convey theinformation shown in Figure 12.1.The axes are labeled just like those in a histogram.Thus, the horizontal axis shows data values and the vertical axis shows frequencies.Once a histogram has been constructed, it’s fairly easy to draw a frequency polygon.Figure 12.2 shows a histogram with a dot at the top of each rectangle at its midpoint.Connect each of these midpoints with a straight line. To complete the frequencypolygon at both ends, the lines should be drawn down to touch the horizontal axis.The completed frequency polygon is shown in Figure 12.3.

Ages (years)

1810 161412 17151311

5

4

3

2

1

9

8

7

6

Num

ber

of B

oys

wit

h M

axim

umY

earl

y G

row

th (

freq

uenc

y)

Boys' Ages of Maximum Yearly Growth

FIGURE 12.3 A frequency polygon

Stem-and-Leaf PlotsA unique way of displaying data uses a tool called a stem-and-leaf plot. Example5 illustrates how we sort the data, revealing the same visual impression created bya histogram.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 685

Page 8: Ch12 Statistics

686 CHAPTER 12 Statistics

E X A M P L E 5 CO N S T R U C T I N G A S T E M - A N D - L E A F P LOT

Use the data showing statistics test scores for 40 students to construct a stem-and-leaf plot:

82 47 75 64 57 82 63 9376 68 84 54 88 77 79 8094 92 94 80 94 66 81 6775 73 66 87 76 45 43 5657 74 50 78 71 84 59 76.

S O L U T I O N The plot is constructed by separating each data item into two parts.The first part is the stem. The stem consists of the tens digit. For example, the stemfor the score of 82 is 8. The second part is the leaf. The leaf consists of the units digitfor a given value. For the score of 82, the leaf is 2. The possible stems for the40 scores are 4, 5, 6, 7, 8, and 9, entered in the left column of the plot.

Begin by entering each data item in the first row:

82 47 75 64 57 82 63 93.Entering Adding Adding Adding 8 2 : 4 7 : 7 5 : 6 4 :

Stems Leaves Stems Leaves Stems Leaves Stems Leaves

4 4 7 4 7 4 7

5 5 5 5

6 6 6 6 4

7 7 7 5 7 5

8 2 8 2 8 2 8 2

9 9 9 9

Adding Adding Adding Adding 5 7 : 8 2 : 6 3 : 9 3 :

Stems Leaves Stems Leaves Stems Leaves Stems Leaves

4 7 4 7 4 7 4 7

5 7 5 7 5 7 5 7

6 4 6 4 6 4 3 6 4 3

7 5 7 5 7 5 7 5

8 2 8 2 2 8 2 2 8 2 2

9 9 9 9 3

We continue in this manner and enter all the data items. Figure 12.4 shows thecompleted stem-and-leaf plot. If you turn the page so that the left margin is on the bot-tom and facing you, the visual impression created by the enclosed leaves is the same asthat created by a histogram.An advantage over the histogram is that the stem-and-leafplot preserves exact data items. The enclosed leaves extend farthest to the right whenthe stem is 7.This shows that the greatest frequency of students scored in the 70s.

Stems

4

5

6

7

8

9

Leaves

7

7

4

5

2

3

5

4

3

6

2

4

3

6

8

7

4

2

7

6

9

8

4

0

7

5

0

4

9

6

3

0

6

1

4

7

8

4

1 6

Tens digit Units digit

FIGURE 12.4 A stem-and-leaf plotdisplaying 40 test scores

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 686

Page 9: Ch12 Statistics

4 Identify deceptions in visualdisplays of data.

SECTION 12.1 Sampling, Frequency Distributions, and Graphs 687

Construct a stem-and-leaf plot for the data in Check Point 4 on page 684.

Deceptions in Visual Displays of DataBenjamin Disraeli, Queen Victoria’s prime minister, stated that there are “lies,damned lies, and statistics.” The problem is not that statistics lie, but rather that liarsuse statistics. Graphs can be used to distort the underlying data, making it difficultfor the viewer to learn the truth. One potential source of misunderstanding is thescale on the vertical axis used to draw the graph. This scale is important becauseit lets a researcher “inflate” or “deflate” a trend. For example, both graphs in Fig-ure 12.5 present identical data for the percentage of people in the United Statesliving below the poverty level from 2000 through 2004. The graph on the leftstretches the scale on the vertical axis to create an overall impression of a povertyrate increasing rapidly over time.The graph on the right compresses the scale on thevertical axis to create an impression of a poverty rate that is slowly increasing, andbeginning to level off, over time.

CH

EC

K POINT

5

The graph in both figures present this data.

2000

2001

2002

2003

2004

11.3%

11.7%

12.1%

12.5%

12.7%

Year Poverty Rate

Pove

rty

Rat

e

Year2000 2001 2002 2003 2004

12.9%

12.5%

12.1%

11.7%

11.3% Pove

rty

Rat

e

Year2000 2001 2002 2003 2004

19%

17%

15%

13%

11%

U.S. poverty raterapidly increases.

U.S. poverty rateslowly increases.

Percentage of People in the United States Living below the Poverty Level, 2000-2004

FIGURE 12.5Source: U.S. Census Bureau

T H I N G S T O W A T C H F O R I N V I S U A L D I S P L AY S O F D A T A

1. Is there a title that explains what is being displayed?

2. Are numbers lined up with tick marks on the vertical axis that clearlyindicate the scale? Has the scale been varied to create a more or lessdramatic impression than shown by the actual data?

3. Do too many design and cosmetic effects draw attention from or distort thedata?

4. Has the wrong impression been created about how the data are changingbecause equally spaced time intervals are not used on the horizontal axis?

5. Are bar sizes scaled proportionately in terms of the data they represent?

6. Is there a source that indicates where the data in the display came from? Dothe data come from an entire population or a sample? Was a random sampleused and, if so, are there possible differences between what is displayed inthe graph and what is occurring in the entire population? (We’ll discussthese margins of error in Section 12.4.) Who is presenting the visual display,and does that person have a special case to make for or against the trendshown by the graph?

Table 12.4 on the next page contains two examples of misleading visual displays.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 687

Page 10: Ch12 Statistics

688 CHAPTER 12 Statistics

T A B L E 1 2 . 4 E X A M P L E S O F M I S L E A D I N G V I S U A L D I S P L AYS

Graphic Display Presentation Problems

Although the length of each dollar bill isproportional to its spending power, the visualdisplay varies both the length and width of thebills to show the diminishing power of the dollarover time. Because our eyes focus on the areasof the dollar-shaped bars, this creates theimpression that the purchasing power of thedollar diminished even more than it really did.If the area of the dollar were drawn to reflectits purchasing power, the 2005 dollar would beapproximately twice as large as the one shownin the graphic display.

Cosmetic effects of homes with equal heights,but different frontal additions and shadowlengths, make it impossible to tell if theyproportionately depict the given areas. Timeintervals on the horizontal axis are not uniformin size, making it appear that dwelling swellinghas been linear from 1970 through 2004. Thedata indicate that this is not the case. Therewas a greater increase in area from 1970through 1980, averaging 29 square feet peryear, than from 1990 through 2004, averagingapproximately 19.2 square feet per year.

Source: Bureau of Labor Statistics

1970 1990 2004

2349

2080

1500

Source: National Association of Home Builders

Creating an Inaccurate Picture by Leaving Something OutOn Monday, October 19, 1987, the Dow JonesIndustrial Average plunged 508 points, losing 22.6%of its value. The graph shown on the left, whichappeared in a major newspaper following “BlackMonday” (as it was instantly dubbed), creates theimpression that the Dow average had been “bullish”from 1972 through 1987, increasing throughout thisperiod. The graph creates this inaccurate picture byleaving something out. The graph on the right illus-trates that the stock market rose and fell sharply overthese years. The impressively smooth curve on theleft was obtained by plotting only three of the datapoints. By ignoring most of the data, increases anddecreases are not accounted for and the actualbehavior of the market over the 15 years leading to“Black Monday” is inaccurately conveyed.

Source: A. K. Dewdney, 200% of Nothing, John Wiley and Sons, 1993

The Dow Jones Industrial Average: 1972–1987

Growth Using OnlyThree Data Points

Nov. 1, 1972 Oct. 16, 1987

Growth and DeclineUsing All Data

Nov. 1, 1972 Oct. 16, 1987

1000

2000

1000

2000

Dow

Jon

es I

ndus

tria

l Ave

rage

BLIT

ZER

B NUS

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 688

Page 11: Ch12 Statistics

SECTION 12.1 Sampling, Frequency Distributions, and Graphs 689

E X E R C I S E S E T 1 2 . 1

b. Survey 100 individuals who are randomly selected froma list of all people living in the state in which the city inquestion is located.

c. Survey a random sample of persons within eachneighborhood of the city.

d. Survey every tenth person who enters City Hall on arandomly selected day.

A questionnaire was given to students in an introductorystatistics class during the first week of the course. One questionasked, “How stressed have you been in the last on ascale of 0 to 10, with 0 being not at all stressed and 10 being asstressed as possible?” The students’ responses are shown in thefrequency distribution. Use this frequency distribution to solveExercises 5–8.

Stress Rating Frequency

0 2

1 1

2 3

3 12

4 16

5 18

6 13

7 31

8 26

9 15

10 14

Source: Journal of Personality and SocialPsychology, 69, 1102–1112

5. Which stress rating describes the greatest number ofstudents? How many students responded with this rating?

6. Which stress rating describes the least number ofstudents? How many responded with this rating?

7. How many students were involved in this study?

8. How many students had a stress rating of 8 or more?

9. A random sample of 30 college students is selected. Eachstudent is asked how much time he or she spent on home-work during the previous week. The following times (inhours) are obtained:

16, 24, 18, 21, 18, 16, 18, 17, 15, 21, 19, 17, 17, 16, 19, 18,15, 15, 20, 17, 15, 17, 24, 19, 16, 20, 16, 19, 18, 17.

Construct a frequency distribution for the data.

10. A random sample of 30 male college students is selected.Each student is asked his height (to the nearest inch).The heights are as follows:

72, 70, 68, 72, 71, 71, 71, 69, 73, 71, 73, 75, 66, 67, 75,74, 73, 71, 72, 67, 72, 68, 67, 71, 73, 71, 72, 70, 73, 70.

Construct a frequency distribution for the data.

2 12 weeks,

• Practice and Application Exercises1. “The Man Poll” of 1014 randomly selected American men

ages 18 and older was taken by Newsweek June 2–4, 2003.The data shown below are from the poll.

IN GENERAL, HOW WOULD YOU DESCRIBE YOUR HEALTH?

Age

18–34 35–55

Excellent 23% 19% 15%

Very good 37 39 32

Good 28 28 28

Fair 10 10 16

Poor 2 4 8

Don’t know 0 0 1

a. Describe the population and the sample of this poll.

b. For each man, what variable is measured? Are the dataquantitative (numerical) or qualitative (nonnumericalcategories)?

2. The American Association of Nurse Anesthetists Journal(Feb. 2000) published the results of a study on the use ofherbal medicines before surgery. Of 500 surgical patientswho were randomly selected for the study, 51% usedherbal or alternative medicines prior to surgery.

a. Describe the population and the sample of this study.

b. Is the sample representative of the population? Explainyour answer.

c. For each patient, what variable is measured? Are thedata quantitative (numerical) or qualitative (nonnu-merical categories)?

3. The government of a large city needs to determinewhether the city’s residents will support the constructionof a new jail. The government decides to conduct a surveyof a sample of the city’s residents.Which one of the follow-ing procedures would be most appropriate for obtaining asample of the city’s residents?

a. Survey a random sample of the employees and inmatesat the old jail.

b. Survey every fifth person who walks into City Hall ona given day.

c. Survey a random sample of persons within eachgeographic region of the city.

d. Survey the first 200 people listed in the city’stelephone directory.

4. The city council of a large city needs to know whether itsresidents will support the building of three new schools.The council decides to conduct a survey of a sample ofthe city’s residents. Which procedure would be mostappropriate for obtaining a sample of the city’s residents?

a. Survey a random sample of teachers who live in the city.

»56

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 689

Page 12: Ch12 Statistics

690 CHAPTER 12 Statistics

A college professor had students keep a diary of their socialinteractions for a week. Excluding family and work situations,the number of social interactions of ten minutes or longer over theweek is shown in the following grouped frequency distribution.Use this information to solve Exercises 11–18.

Number of Social Interactions Frequency

0–4 12

5–9 16

10–14 16

15–19 16

20–24 10

25–29 11

30–34 4

35–39 3

40–44 3

45–49 3

Source: Society for Personality and Social Psychology

11. Identify the lower class limit for each class.12. Identify the upper class limit for each class.13. What is the class width?14. How many students were involved in this study?15. How many students had at least 30 social interactions

for the week?16. How many students had at most 14 social interactions

for the week?17. Among the classes with the greatest frequency, which class

has the least number of social interactions?18. Among the classes with the smallest frequency, which class

has the least number of social interactions?19. As of 2007, the following are the ages, in chronological

order, at which U.S. presidents were inaugurated:

57, 61, 57, 57, 58, 57, 61, 54, 68, 51, 49, 64, 50, 48, 65, 52, 56,46, 54, 49, 50, 47, 55, 55, 54, 42, 51, 56, 55, 51, 54, 51, 60, 62,43, 55, 56, 61, 52, 69, 64, 46, 54.Source: Time Almanac

Construct a grouped frequency distribution for the data.Use 41–45 for the first class and use the same width foreach subsequent class.

20. The IQ scores of 70 students enrolled in a liberal artscourse at a college are as follows:

102, 100, 103, 86, 120, 117, 111, 101, 93, 97, 99, 95, 95, 104,104, 105, 106, 109, 109, 89, 94, 95, 99, 99, 103, 104, 105, 109,110, 114, 124, 123, 118, 117, 116, 110, 114, 114, 96, 99, 103,103, 104, 107, 107, 110, 111, 112, 113, 117, 115, 116, 100,104, 102, 94, 93, 93, 96, 96, 111, 116, 107, 109, 105, 106, 97,106, 107, 108.

Construct a grouped frequency distribution for the data.Use 85–89 for the first class and use the same width foreach subsequent class.

21. Construct a histogram and a frequency polygon for thedata involving stress ratings in Exercises 5–8.

22. Construct a histogram and a frequency polygon for thedata in Exercise 9.

23. Construct a histogram and a frequency polygon for thedata in Exercise 10.

24. The histogram shows the distribution of starting salaries(rounded to the nearest thousand dollars) for collegegraduates based on a random sample of recent graduates.

Which one of the following is true according to the graph?

a. The graph is based on a sample of approximately 500recent college graduates.

b. More college graduates had starting salaries in the$51,000–$55,000 range than in the $36,000–$40,000 range.

c. If the sample is truly representative, then for a group of400 college graduates, we can expect about 28 of themto have starting salaries in the $31,000–$35,000 range.

d. The percentage of starting salaries falling above thoseshown by any rectangular bar is equal to the percentageof starting salaries falling below that bar.

25. The frequency polygon shows a distribution of IQ scores.Which one of the following is true based upon the graph?

a. The graph is based on a sample of approximately50 people.

b. More people had an IQ score of 100 than any other IQscore, and as the deviation from 100 increases ordecreases, the scores fall off in a symmetrical manner.

c. More people had an IQ score of 110 than a score of 90.

d. The percentage of scores above any IQ score is equal tothe percentage of scores below that score.

IQ Score

70 80 90 10075 85 95 105 115 125110 120 130

Num

ber

of P

eopl

e (f

requ

ency

)

5

10

15

20

25

30

35

Distribution of IQ Scores

Salary (thousands of dollars)

Starting Salaries of Recent College Graduates

31-3

5

36-4

0

41-4

5

46-5

0

51-5

5

56-6

0

61-6

5

Num

ber

of G

radu

ates

(fr

eque

ncy)

0

50

100

150

200

250

300

350

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 690

Page 13: Ch12 Statistics

SECTION 12.1 Sampling, Frequency Distributions, and Graphs 691

26. Construct a stem-and-leaf plot for the data in Exercise 19showing the ages at which U.S. presidents were inaugurated.

27. A random sample of 40 college professors is selectedfrom all professors at a university. The following list givestheir ages:

63, 48, 42, 42, 38, 59, 41, 44, 45, 28, 54, 62, 51, 44, 63, 66, 59,46, 51, 28, 37, 66, 42, 40, 30, 31, 48, 32, 29, 42, 63, 37, 36, 47,25, 34, 49, 30, 35, 50.

Construct a stem-and-leaf plot for the data. What does theshape of the display reveal about the ages of the professors?

28. In “Ages of Oscar-Winning Best Actors and Actresses”(Mathematics Teacher magazine) by Richard Brown andGretchen Davis, the stem-and-leaf plots shown comparethe ages of actors and actresses for 30 winners of the Oscarat the time they won the award.

Actors Stems Actresses

2 146667

98753221 3 00113344455778

88776543322100 4 11129

6651 5

210 6 011

6 7 4

8 0

a. What is the age of the youngest actor to win an Oscar?

b. What is the age difference between the oldest and theyoungest actress to win an Oscar?

c. What is the oldest age shared by two actors to win anOscar?

d. What differences do you observe between the twostem-and-leaf plots? What explanations can you offerfor these differences?

In Exercises 29–33, describe what is misleading in each visualdisplay of data.

29.

30.

Source: U.S. Census Bureau

172,000190,078

171,061

147,120

114,487

Book Title Output in theUnited States

Source: R. R. Bowker

31.

29%U.S.

6%Germany

5%UK

4%France

4%China

Percentage of the World’sComputers in Use,by Country

9%Japan

Source: Computer Industry Almanac

32.

1856 1832 1825 1897 1854

31,288

2000

32,317

2001

32,941

2002

33,878

2003

33,405

2004

Total Students

African Americans

Source: University of Georgia Office of Institutional Research

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 691

Page 14: Ch12 Statistics

36.8%$10,008

Student Loans on the Rise:Percentage of Bachelor’s Degree Recipients withFederal Student Loans and Median Loan Amount

1992–93

50.1%$13,327

1995– 96

60.3%$16,958

1999–00

2003–0463.0%$16,432

Source: American Council on Education

692 CHAPTER 12 Statistics

33.

• Writing in Mathematics34. What is a population? What is a sample?

35. Describe what is meant by a random sample.

36. Suppose you are interested in whether or not the studentsat your college would favor a grading system in which stu-dents may receive final grades of

and so on. Describe how you might obtain arandom sample of 100 students from the entire studentpopulation.

37. For Exercise 36, would questioning every fifth student ashe or she is leaving the campus library until 100 studentsare interviewed be a good way to obtain a random sample?Explain your answer.

38. What is a frequency distribution?

C+ , C, C- ,A+ , A, A- , B+ , B, B- ,

39. What is a histogram?40. What is a frequency polygon?41. Describe how to construct a frequency polygon from a

histogram.42. Describe how to construct a stem-and-leaf plot from a

set of data.43. Describe two ways that graphs can be misleading.

• Critical Thinking Exercises44. Construct a grouped frequency distribution for the following

data, showing the length, in miles, of the 25 longest rivers inthe United States. Use five classes that have the same width.

2540 2340 1980 1900 1900

1460 1450 1420 1310 1290

1280 1240 1040 990 926

906 886 862 800 774

743 724 692 659 649Source: U.S. Department of the Interior

45. Use two line graphs drawn in the same coordinate systemthat show a better way to portray the enrollmentinformation for 2000 through 2004 in Exercise 32.

• Group Exercises46. The classic book on distortion using statistics is How to Lie

with Statistics by Darrell Huff. This activity is designed forfive people. Each person should select two chapters fromHuff’s book and then present to the class the commonmethods of statistical manipulation and distortion thatHuff discusses.

47. Each group member should find one example of a graphthat presents data with integrity and one example of agraph that is misleading. Use newspapers, magazines, theInternet, books, and so forth. Once graphs have been col-lected, each member should share his or her graphs withthe entire group. Be sure to explain why one graph depictsdata in a forthright manner and how the other graphmisleads the viewer.

SECTION 12.2 • MEASURES OF CENTRAL TENDENCYO B J E C T I V E S

1. Determine the mean for a dataset.

2. Determine the median for adata set.

3. Determine the mode for adata set.

4. Determine the midrange for adata set.

According to researchers,“Robert,” the average American guy, is 31 years old, 5 feet10 inches, 172 pounds, works 6.1 hours daily, and sleeps 7.7 hours. These numbersrepresent what is “average” or “typical” of American men. In statistics, such valuesare known as measures of central tendency because they are generally locatedtoward the center of a distribution. Four such measures are discussed in this section:

© NAS. Reprinted with permission of North America Syndicate.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 692

Page 15: Ch12 Statistics

1 Determine the mean for a data set.

SECTION 12.2 Measures of Central Tendency 693

the mean, the median, the mode, and the midrange. Each measure of central tendencyis calculated in a different way.Thus, it is better to use a specific term (mean, median,mode, or midrange) than to use the generic descriptive term “average.”

The MeanBy far the most commonly used measure of central tendency is the mean. The meanis obtained by adding all the data items and then dividing the sum by the number ofitems. The Greek letter sigma, called a symbol of summation, is used to indicatethe sum of data items. The notation read “the sum of x,” means to add allthe data items in a given data set. We can use this symbol to give a formula forcalculating the mean.

T H E M E A NThe mean is the sum of the data items divided by the number of items.

where represents the sum of all the data items and n represents thenumber of items.

E X A M P L E 1 C A LC U L AT I N G T H E M E A N

Table 12.5 shows the ten youngest male singers in the United States to have a number1 single. Find the mean age of these male singers at the time of their number 1 single.

©x

Mean =©xn

,

©x,©,

BLIT

ZER

B NUS

The Mean AmericanMillionaire

According to a study involving arandom sample of 1300 ofAmerica’s 2,270,000 million-aires, the average millionairelives in a $300,000 home in anupper-middle-class neighbor-hood, is 54 years old, drives afour-year-old American car, andhad an SAT score of 1190.

Source: Stanley, The Millionaire NextDoor

T A B L E 1 2 . 5 YO U N G E S T U. S . M A L E S I N G E R S TO H AV E A N U M B E R 1 S I N G L E

Artist/Year Title Age

Stevie Wonder, 1963 “Fingertips” 13

Donny Osmond, 1971 “Go Away Little Girl” 13

Michael Jackson, 1972 “Ben” 14

Laurie London, 1958 “He’s Got the Whole World in His Hands” 14

Paul Anka, 1957 “Diana” 16

Brian Hyland, 1960 “Itsy Bitsy Teenie Weenie Yellow Polkadot Bikini” 16

Shaun Cassidy, 1977 “Da Doo Ron Ron” 17

Bobby Vee, 1961 “Take Good Care of My Baby” 18

Usher, 1998 “Nice & Slow” 19

Andy Gibb, 1977 “I Just Want to Be Your Everything” 19

Source: Russell Ash, The Top 10 of Everything

S O L U T I O N We find the mean by adding the ages and dividing this sum by 10, thenumber of data items.

The mean age of the ten youngest singers to have a number 1 single is 15.9.

One and only one mean can be calculated for any group of numerical data. Themean may or may not be one of the actual data items. In Example 1, the mean was15.9, although no data item is 15.9.

Find the mean for each group of data items:

a. 10, 20, 30, 40, 50 b. 3, 10, 10, 10, 117.

CH

EC

K POINT

1

Mean =©xn

=13 + 13 + 14 + 14 + 16 + 16 + 17 + 18 + 19 + 19

10=

15910

= 15.9

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 693

Page 16: Ch12 Statistics

In Example 1, some of the data items were identical. We can use multiplicationwhen computing the mean for these identical items:

When many data values occur more than once and a frequency distribution isused to organize the data, we can use the following formula to calculate the mean:

CALCULATING THE MEAN FOR A FREQUENCY DISTRIBUTION

where

x represents a data value.

represents the frequency of that data value.

represents the sum of all the products obtained by multiplying eachdata value by its frequency.

n represents the total frequency of the distribution.

E X A M P L E 2C A LC U L AT I N G T H E M E A N F O R A F R E Q U E N C YD I S T R I B U T I O N

In the previous exercise set, we mentioned a questionnaire given to students in anintroductory statistics class during the first week of the course. One question asked,“How stressed have you been in the last on a scale of 0 to 10, with 0 beingnot at all stressed and 10 being as stressed as possible?” Table 12.6 shows the students’ responses. Use this frequency distribution to find the mean of the stress-level ratings.

S O L U T I O N We use the formula

First, we must find obtained by multiplyingeach data value, x, by itsfrequency, Then, weneed to find the sum ofthese products, Wecan use the frequencydistribution to organizethese computations. Adda third column in whicheach data value is multi-plied by its frequency.This column, shown onthe right, is headed Then, find the sum of thevalues, in thiscolumn.

©xf,

xf.

©xf.

f.

xf,

Mean =©xf

n.

2 12 weeks,

©xf

f

Mean =©xf

n,

The data values 13, 14, 16, and 19 each have a frequency of 2.

=

Mean=13+13+14+14+16+16+17+18+19+19

10

13 � 2+14 � 2+16 � 2+17 � 1+18 � 1+19 � 210

T A B L E 1 2 . 6 S T U D E N T S’

S T R E SS - L E V E L R AT I N G S

Stress Rating x Frequency

0 2

1 1

2 3

3 12

4 16

5 18

6 13

7 31

8 26

9 15

10 14

f

Source: Journal of Personality and SocialPsychology, 69, 1102–1112

694 CHAPTER 12 Statistics

x

0 2

1 1

2 3

3 12

4 16

5 18

6 13

7 31

8 26

9 15

10 14 10 # 14 = 140

9 # 15 = 135

8 # 26 = 208

7 # 31 = 217

6 # 13 = 78

5 # 18 = 90

4 # 16 = 64

3 # 12 = 36

2 # 3 = 6

1 # 1 = 1

0 # 2 = 0

xff

This value, the sum of the numbersin the second column, is the totalfrequency of the distribution.

Σxf is the sum of thenumbers in the thirdcolumn.Totals: n=151 ∑ xf=975

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 694

Page 17: Ch12 Statistics

2 Determine the median for a dataset.

SECTION 12.2 Measures of Central Tendency 695

Now, substitute these values into the formula for the mean. Remember that n isthe total frequency of the distribution, or 151.

The mean of the 0 to 10 stress-level ratings is approximately 6.46. Notice that themean is greater than 5, the middle of the 0 to 10 scale.

Find the mean for the data items in the frequency distribution. (In order tosave space, we’ve written the frequency distribution horizontally.)

Score, x 30 33 40 50

Frequency, 3 4 4 1

The MedianThe median age in the United States is 35.3. The oldest state by median age isFlorida (38.7) and the youngest state is Utah (27.1).To find these values, researchersbegin with appropriate random samples. The data items—that is, the ages—arearranged from youngest to oldest. The median age is the data item in the middle ofeach set of ranked, or ordered, data.

T H E M E D I A NTo find the median of a group of data items,

1. Arrange the data items in order, from smallest to largest.

2. If the number of data items is odd, the median is the data item in the middleof the list.

3. If the number of data items is even, the median is the mean of the twomiddle data items.

E X A M P L E 3 F I N D I N G T H E M E D I A N

Find the median for each of the following groups of data:

a. 84, 90, 98, 95, 88b. 68, 74, 7, 13, 15, 25, 28, 59, 34, 47.

S O L U T I O N

a. Arrange the data items in order, from smallest to largest. The number of dataitems in the list, five, is odd. Thus, the median is the middle number.

The median is 90. Notice that two data items lie above 90 and two data items liebelow 90.

b. Arrange the data items in order, from smallest to largest.The number of data itemsin the list, ten, is even. Thus, the median is the mean of the two middle data items.

Median =28 + 34

2=

622

= 31

Middle data itemsare 28 and 34.

7, 13, 15, 25, 28, 34, 47, 59, 68, 74

Middle dataitem

84, 88, 90, 95, 98

fC

HE

C

K POINT

2

Mean =©xf

n=

975151

L 6.46

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 695

Page 18: Ch12 Statistics

696 CHAPTER 12 Statistics

S T U D Y T I P

The formula

gives the position of the median,and not the actual value of themedian. When finding the median,be sure to first arrange the dataitems in order from smallest tolargest.

n + 12

The median is 31. Five data items lie above 31 and five data items lie below 31.

Find the median for each of the following groups of data:

a. 28, 42, 40, 25, 35

b. 72, 61, 85, 93, 79, 87.

If a relatively long list of data items is arranged in order, it may be difficult toidentify the item or items in the middle. In cases like this, the median can be foundby determining its position in the list of items.

P O S I T I O N O F T H E M E D I A NIf n data items are arranged in order, from smallest to largest, the median is thevalue in the

position.

E X A M P L E 4F I N D I N G T H E M E D I A N U S I N G T H E P O S I T I O NF O R M U L A

Listed below are the points scored per season by the 13 top point scorers in theNational Football League. Find the median points scored per season for the top13 scorers.

144, 144, 145, 145, 145, 146, 147, 149, 150, 155, 161, 164, 176

S O L U T I O N The data items are arranged from smallest to largest. There are13 data items, so The median is the value in the

We find the median by selecting the data item in the seventh position.

The median is 147. Notice that six data items lie above 147 and six data items liebelow it. The median points scored per season for the top 13 scorers in the NationalFootball League is 147.

Find the median for the following group of data items:

1, 2, 2, 2, 3, 3, 3, 3, 3, 5, 6, 7, 7, 10, 11, 13, 19, 24, 26.

CH

EC

K POINT

4

Position1

Position2

Position3

Position4

Position7

Position5

Position6

144, 144, 145, 145, 145, 146, 147, 149, 150, 155, 161, 164, 176

n + 12

position =13 + 1

2 position =

142

position = seventh position.

n = 13.

n + 12

CH

EC

K POINT

3

Five data items lie below 31.

7 13 15 25 28

Five data items lie above 31.

Median is 31.

34 47 59 68 74

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 696

Page 19: Ch12 Statistics

T A B L E 1 2 . 7 N U M B E R O F H O M E

R U N S BY B A S E B A L L T E A M S I N T H E

N AT I O N A L L E AG U E , 2 0 0 5

Home Team Runs

Washington Nationals 117

Florida Marlins 128

San Francisco Giants 128

San Diego Padres 130

Pittsburgh Pirates 139

Los Angeles Dodgers 149

Colorado Rockies 150

Houston Astros 161

Philadelphia Phillies 167

St. Louis Cardinals 170

Milwaukee Brewers 175

New York Mets 175

Atlanta Braves 184

Arizona Diamondbacks 191

Chicago Cubs 194

Cincinnati Reds 222

Source: The World Almanac

SECTION 12.2 Measures of Central Tendency 697

E X A M P L E 5F I N D I N G T H E M E D I A N U S I N G T H E P O S I T I O NF O R M U L A

Table 12.7 gives the number of home runs for the 16 baseball teams in the NationalLeague in 2005. Find the median number of home runs for these teams.

S O L U T I O N The data items are arranged from smallest to largest. There are16 data items, so The median is the value in the

This means that the median is the mean of the data items in positions 8 and 9.

The median number of home runs for the teams in the National League in 2005 was 164.

Listed below are the number of home runs for each of the 14 baseball teamsin the American League in 2005. Find the median number of home runs for

these teams.

126, 130, 134, 136, 147, 155, 157, 168, 189, 199, 200, 207, 229, 260

When individual data items are listed from smallest to largest, you can find themedian by identifying the item or items in the middle or by using the formulafor its position. However, the formula for the position of the median is useful whendata items are organized in a frequency distribution.

E X A M P L E 6F I N D I N G T H E M E D I A N F O R A F R E Q U E N C YD I S T R I B U T I O N

The frequency distribution for the stress-level ratings of 151 students is repeatedbelow using a horizontal format. Find the median stress-level rating.

x 0 1 2 3 4 5 6 7 8 9 10

2 1 3 12 16 18 13 31 26 15 14 Total: n = 151f

n + 12

CH

EC

K POINT

5

Median =161 + 167

2=

3282

= 164

117, 128, 128, 130, 139, 149, 150, 161, 167, 170, 175, 175, 184, 191, 194, 222

Position3

Position1

Position2

Position5

Position6

Position4

Position8

Position7

Position9

n + 12

position =16 + 1

2 position =

172

position = 8.5 position.

n = 16.

Stress rating

Number ofcollege students

S O L U T I O N There are 151 data items, so The median is the value in the

We find the median by selecting the data item in the 76th position. The frequencydistribution indicates that the data items begin with

0, 0, 1, 2, 2, 2, Á .

n + 12

position =151 + 1

2 position =

1522

position = 76th position.

n = 151.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 697

Page 20: Ch12 Statistics

The 76th data item is 7. The median stress-level rating is 7.

Find the median for the following frequency distribution.

x 42 43 46 51 52 54 55 56 60 61 64 69

1 1 1 3 1 2 2 2 1 2 1 1f

CH

EC

K POINT

6

698 CHAPTER 12 Statistics

We can write the data items all out and then select the median, the 76th data item.Amore efficient way to proceed is to count down the frequency column in the distrib-ution until we identify the 76th data item:

x

0 2

1 1

2 3

3 12

4 16

5 18

6 13

7 31

8 26

9 15

10 14

f

We count down the frequency column.

1, 2

3

4, 5, 6

7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18

19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34

35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,

53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65

66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76

Stop counting. We’vereached the 76th data item.

Age at presidentialinauguration

Number of U.S. presidentsassuming office in the 20thcentury with the given age

Statisticians generally use the median, rather than the mean, when reportingincome. Why? Our next example will help to answer this question.

E X A M P L E 7 CO M PA R I N G T H E M E D I A N A N D T H E M E A N

Five employees in the assembly section of a television manufacturing company earnsalaries of $19,700, $20,400, $21,500, $22,600, and $23,000 annually. The sectionmanager has an annual salary of $95,000.

a. Find the median annual salary for the six people.

b. Find the mean annual salary for the six people.

S O L U T I O N

a. To compute the median, first arrange the salaries in order:$19,700, $20,400, $21,500, $22,600, $23,000, $95,000.

Because the list contains an even number of data items, six, the median is themean of the two middle items.

The median annual salary is $22,050.

b. We find the mean annual salary by adding the six annual salaries and dividing by 6.

The mean annual salary is $33,700.

=$202,200

6= $33,700

Mean =$19,700 + $20,400 + $21,500 + $22,600 + $23,000 + $95,000

6

Median =$21,500 + $22,600

2=

$44,1002

= $22,050

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 698

Page 21: Ch12 Statistics

SECTION 12.2 Measures of Central Tendency 699

In Example 7, the median annual salary is $22,050 and the mean annual salary is$33,700.Why such a big difference between these two measures of central tendency?The relatively high annual salary of the section manager, $95,000, pulls the meansalary to a value considerably higher than the median salary.When one or more dataitems are much greater than the other items, these extreme values can greatly influ-ence the mean. In cases like this, the median is often more representative of the data.

This is why the median, rather than the mean, is used to summarize the incomes,by gender and race, shown in Figure 12.6. Because no one can earn less than $0, thedistribution of income must come to an end at $0 for each of these eight groups.By contrast, there is no upper limit on income on the high side. In the United States,the wealthiest 5% of the population earn about 21% of the total income. Therelatively few people with very high annual incomes tend to pull the mean incometo a value considerably greater than the median income. Reporting mean incomes inFigure 12.6 would inflate the numbers shown, making them nonrepresentative ofthe millions of workers in each of the eight groups.

U.S. Median Income in 2003, by Gender and Race

Male Female

Med

ian

Inco

me

(tho

usan

ds o

f dol

lars

)

5

10

15

20

25

30

35 $32,331

$21,935 $21,053

$31,737

$18,301$16,540

$13,642

$17,879

White, non-Hispanic African American Hispanic Asian

FIGURE 12.6Source: U.S. Census Bureau

The ten countries in Table 12.8 accounted for 75% of the nearly $900 billionspent in 2003 on defense worldwide.

CH

EC

K POINT

7

T A B L E 1 2 . 8 TO P T E N M I L I TA RY S P E N D E R S , 2 0 0 3

Country National Expenditure Dollars Spent Percentage of Global (billions of dollars) per Resident Military Spending

United States $417.4 $1419 47%

Japan $ 46.9 $ 367 5%

UK $ 37.1 $ 627 4%

France $ 35.0 $ 583 4%

China $ 32.8 $ 25 4%

Germany $ 27.2 $ 329 3%

Italy $ 20.8 $ 362 2%

Iran $ 19.2 $ 279 2%

Saudi Arabia $ 19.1 $ 789 2%

South Korea $ 13.9 $ 292 2%

Source: SIPRI, “The Major Spenders in 2003”

a. Find the mean national expenditure on defense, in billions of dollars,for the ten countries.

b. Find the median national expenditure on defense, in billions ofdollars, for the ten countries.

c. Describe why one of the measures of central tendency is so muchgreater than the other.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 699

Page 22: Ch12 Statistics

4 Determine the midrange for adata set.

T A B L E 1 2 . 9 T E N H OT T E S T

U. S . C I T I E S

City Mean Temperature

Key West, FL 77.8°

Miami, FL 75.9°

West Palm Beach, FL 74.7°

Fort Myers, FL 74.4°

Yuma, AZ 74.2°

Brownsville, TX 73.8°

Phoenix, AZ 72.6°

Vero Beach, FL 72.4°

Orlando, FL 72.3°

Tampa, FL 72.3°

Source: National Oceanic and AtmosphericAdministration

700 CHAPTER 12 Statistics

The ModeLet’s take one final look at the frequency distribution for the stress-level ratings of151 college students.

x 0 1 2 3 4 5 6 7 8 9 10

2 1 3 12 16 18 13 31 26 15 14f

Stress rating

Number ofcollege students

7 is the stress rating withthe greatest frequency.

The data value that occurs most often in this distribution is 7, the stress rating for 31of the 151 students. We call 7 the mode of this distribution.

T H E M O D EThe mode is the data value that occurs most often in a data set. If no data itemsare repeated, then the data set has no mode. If more than one data value has thehighest frequency, then each of these data values is a mode.

E X A M P L E 8 F I N D I N G T H E M O D E

Find the mode for the following group of data:

7, 2, 4, 7, 8, 10.

S O L U T I O N The number 7 occurs more often than any other. Therefore, 7 is themode.

Find the mode for the following group of data:

8, 6, 2, 4, 6, 8, 10, 8.

Be aware that a data set might not have a mode. For example, no data item in 2,1, 4, 5, 3 is repeated, so this data group has no mode. By contrast, 3, 3, 4, 5, 6, 6 hastwo data values with the highest frequency, namely 3 and 6. Each of these datavalues is a mode and the data set is said to be bimodal.

The MidrangeTable 12.9 shows the ten hottest cities in the United States. Because temperature isconstantly changing, you might wonder how the mean temperatures shown in thetable are obtained.

First, we need to find a representative daily temperature. This is obtained byadding the lowest and highest temperatures for the day and then dividing this sum by2. Next, we take the representative daily temperatures for all 365 days, add them, anddivide the sum by 365. These are the mean temperatures that appear in Table 12.9.

Representative daily temperature,

is an example of a measure of central tendency called the midrange.

T H E M I D R A N G EThe midrange is found by adding the lowest and highest data values and dividingthe sum by 2.

Midrange =lowest data value + highest data value

2

lowest daily temperature + highest daily temperature

2,

CH

EC

K POINT

8

3 Determine the mode for a data set.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 700

Page 23: Ch12 Statistics

SECTION 12.2 Measures of Central Tendency 701

E X A M P L E 9 F I N D I N G T H E M I D R A N G E

One criticism of major league baseball is that the discrepancy between teampayrolls hampers fair competition. In 2006, the New York Yankees had the greatestpayroll, a record $194,663,100 (median salary: $2,925,000).The Florida Marlins werethe worst paid team, with a payroll of $14,998,500 (median salary: $327,000). Findthe midrange for the annual payroll of major league baseball teams in 2006.(Source: usatoday.com)

S O L U T I O N

The midrange for the annual payroll of major league baseball teams in 2006 was$104,830,800.

We can find the mean annual payroll of the 30 professional baseball teams in 2006by adding up the payrolls of all 30 teams and then dividing the sum by 30. It is muchfaster to calculate the midrange, which is often used as an estimate for the mean.

The best paid state governor is in New York, earning $179,000 annually. Theworst paid is the governor of Maine, earning $70,000 annually. Find the

midrange for annual salaries of U.S. governors.

E X A M P L E 1 0F I N D I N G T H E F O U R M E A S U R E S O F C E N T R A LT E N D E N C Y

Suppose your six exam grades in a course are

52, 69, 75, 86, 86, and 92.

Compute your final course grade (below ) using the

a. mean. b. median. c. mode. d. midrange.

S O L U T I O N

a. The mean is the sum of the data items divided by the number of items, 6.

Using the mean, your final course grade is C.

b. The six data items, 52, 69, 75, 86, 86, and 92, are arranged in order. Because thenumber of data items is even, the median is the mean of the two middle items.

Using the median, your final course grade is B.

c. The mode is the data value that occurs most frequently. Because 86 occurs mostoften, the mode is 86. Using the mode, your final course grade is B.

d. The midrange is the mean of the lowest and highest data values.

Using the midrange, your final course grade is C.

Midrange =52 + 92

2=

1442

= 72

Median =75 + 86

2=

1612

= 80.5

Mean =52 + 69 + 75 + 86 + 86 + 92

6=

4606

L 76.67

60 = F90–100 = A, 80–89 = B, 70–79 = C, 60–69 = D,

CH

EC

K POINT

9

=$14,998,500 + $194,663,100

2=

$209,661,6002

= $104,830,800

Midrange =lowest annual payroll + highest annual payroll

2

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 701

Page 24: Ch12 Statistics

702 CHAPTER 12 Statistics

Consumer Reports magazine gave the following data for the number ofcalories in a meat hot dog for each of 17 brands:

173, 191, 182, 190, 172, 147, 146, 138, 175, 136, 179, 153, 107, 195, 135, 140, 138.

Find the mean, median, mode, and midrange for the number of calories in a meat hotdog for the 17 brands. If necessary, round answers to the nearest tenth of a calorie.

CH

EC

K POINT

10

E X E R C I S E S E T 1 2 . 2

11. Score Frequency x

1 1

2 1

3 2

4 5

5 7

6 9

7 8

8 6

9 4

10 3

f

In Exercises 13–20, find the median for each group of data items.

13. 7, 4, 3, 2, 8, 5, 1, 3

14. 11, 6, 4, 0, 2, 1, 12, 0, 0

15. 91, 95, 99, 97, 93, 95

16. 100, 100, 90, 30, 70, 100

17. 100, 40, 70, 40, 60

18. 1, 3, 5, 10, 8, 5, 6, 8

19. 1.6, 3.8, 5.0, 2.7, 4.2, 4.2, 3.2, 4.7, 3.6, 2.5, 2.5

20. 1.4, 2.1, 1.6, 3.0, 1.4, 2.2, 1.4, 9.0, 9.0, 1.8

Find the median for the data items in the frequency distribution in

21. Exercise 9.

22. Exercise 10.

23. Exercise 11.

24. Exercise 12.

In Exercises 25–32, find the mode for each group of data items.If there is no mode, so state.

25. 7, 4, 3, 2, 8, 5, 1, 3

26. 11, 6, 4, 0, 2, 1, 12, 0, 0

27. 91, 95, 99, 97, 93, 95

28. 100, 100, 90, 30, 70, 100

29. 100, 40, 70, 40, 60

30. 1, 3, 5, 10, 8, 5, 6, 8

31. 1.6, 3.8, 5.0, 2.7, 4.2, 4.2, 3.2, 4.7, 3.6, 2.5, 2.5

32. 1.4, 2.1, 1.6, 3.0, 1.4, 2.2, 1.4, 9.0, 9.0, 1.8

Find the mode for the data items in the frequency distribution in

33. Exercise 9.

34. Exercise 10.

35. Exercise 11.

36. Exercise 12.

In Exercises 37–44, find the midrange for each group of data items.

37. 7, 4, 3, 2, 8, 5, 1, 3

38. 11, 6, 4, 0, 2, 1, 12, 0, 0

39. 91, 95, 99, 97, 93, 95

40. 100, 100, 90, 30, 70, 100

41. 100, 40, 70, 40, 60

42. 1, 3, 5, 10, 8, 5, 6, 8

43. 1.6, 3.8, 5.0, 2.7, 4.2, 4.2, 3.2, 4.7, 3.6, 2.5, 2.5

44. 1.4, 2.1, 1.6, 3.0, 1.4, 2.2, 1.4, 9.0, 9.0, 1.8

Find the midrange for the data items in the frequency distribu-tion in

45. Exercise 9.

• Practice ExercisesIn Exercises 1–8, find the mean for each group of data items.

1. 7, 4, 3, 2, 8, 5, 1, 3

2. 11, 6, 4, 0, 2, 1, 12, 0, 0

3. 91, 95, 99, 97, 93, 95

4. 100, 100, 90, 30, 70, 100

5. 100, 40, 70, 40, 60

6. 1, 3, 5, 10, 8, 5, 6, 8

7. 1.6, 3.8, 5.0, 2.7, 4.2, 4.2, 3.2, 4.7, 3.6, 2.5, 2.5

8. 1.4, 2.1, 1.6, 3.0, 1.4, 2.2, 1.4, 9.0, 9.0, 1.8

In Exercises 9–12, find the mean for the data items in the givenfrequency distribution.

9. Score Frequency x

1 1

2 3

3 4

4 4

5 6

6 5

7 3

8 2

f10. Score Frequency

x

1 2

2 4

3 5

4 7

5 6

6 4

7 3

f

12. Score Frequency x

1 3

2 4

3 6

4 8

5 9

6 7

7 5

8 2

9 1

10 1

f

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 702

Page 25: Ch12 Statistics

SECTION 12.2 Measures of Central Tendency 703

46. Exercise 10.

47. Exercise 11.

48. Exercise 12.

• Practice PlusIn Exercises 49–54, use each display of data items to find themean, median, mode, and midrange.

49.

50.

51.

52.

53. Stems Leaves

2 1 4 5

3 0 1 1 3

4 2 5

54. Stems Leaves

2 8

3 2 4 4 9

4 0 1 5 7

Score

10

Freq

uenc

y

4

3

2

1

5

11 12 13 14 15

Score

10

Freq

uenc

y

4

3

2

1

5

11 12 13 14 15

Score

10

Freq

uenc

y

4

3

2

1

5

20 30 40 50

Score

10

Freq

uenc

y

4

3

2

1

5

20 30 40 50

• Application Exercises

Exercises 55–59 present data on a variety of topics. For eachdata set described in boldface, find the

a. mean.b. median.c. mode (or state that there is no mode).d. midrange.

55. Ages of the Justices of the United States Supreme Courtin 2007Roberts (52), Stevens (87), Scalia (71), Kennedy (71),Souter (68), Thomas (59), Ginsburg (74), Breyer (69),Alito (56)

56. Number of Reported Violent Attacks against theHomeless in the United States for Various Years from1995 through 2005

60, 63, 52, 36, 70, 105, 86Source: National Coalition for the Homeless

57. Number of Home Runs Hit by Each of the 12 Batters forthe New York Yankees in 2005

0, 4, 8, 12, 14, 17, 19, 19, 23, 32, 34, 48Source: The World Almanac

58. Number of Home Runs Hit by Each of the Ten Batters forthe Florida Marlins in 2005

2, 3, 4, 5, 6, 8, 9, 16, 33, 33Source: The World Almanac

59. Number of Social Interactions of College Students InExercise Set 12.1, we presented a grouped frequency dis-tribution showing the number of social interactions of tenminutes or longer over a one-week period for a group ofcollege students. (These interactions excluded family andwork situations.) Use the frequency distribution shown tosolve this exercise. (This distribution was obtained byreplacing the classes in the grouped frequency distributionpreviously shown with the midpoints of the classes.)

x 2 7 12 17 22 27 32 37 42 47

12 16 16 16 10 11 4 3 3 3fNumber ofcollege students

Social interactionsin a week

The weights (to the nearest five pounds) of 40 randomlyselected male college students are organized in a histogram witha superimposed frequency polygon. Use the graph to answerExercises 60–63.

60. Find the mean weight. 61. Find the median weight.

62. Find the modal weight. 63. Find the midrange weight.

145

150

210

155

160

Num

ber

of S

tude

nts

(fre

quen

cy)

165

170

175

180

185

190

195

200

205

Weights of 40 Male College Students

8

2

4

6

Weight (pounds)

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 703

Page 26: Ch12 Statistics

704 CHAPTER 12 Statistics

73. A student’s parents promise to pay for next semester’stuition if an A average is earned in chemistry. Withexamination grades of 97%, 97%, 75%, 70%, and 55%, thestudent reports that an A average has been earned. Whichmeasure of central tendency is the student reporting as theaverage? How is this student misrepresenting the courseperformance with statistics?

74. According to the National Oceanic and AtmosphericAdministration, the coldest city in the United States isInternational Falls, Minnesota, with a mean Fahrenheittemperature of 36.8°. Explain how this mean is obtained.

75. Using Table 12.8 on page 699, explain why the meanamount spent on defense per resident is so much greaterthan the median amount spent on defense per resident forthe ten countries.

• Critical Thinking Exercises76. Give an example of a set of six examination grades (from 0

to 100) with each of the following characteristics:

a. The mean and the median have the same value, but themode has a different value.

b. The mean and the mode have the same value, but themedian has a different value.

c. The mean is greater than the median.

d. The mode is greater than the mean.

e. The mean, median, and mode have the same value.

f. The mean and mode have values of 72.

77. On an examination given to 30 students, no student scoredbelow the mean. Describe how this occurred.

• Group Exercises78. Select a characteristic, such as shoe size or height, for which

each member of the group can provide a number. Choose acharacteristic of genuine interest to the group. For thischaracteristic, organize the data collected into a frequencydistribution and a graph. Compute the mean, median,mode, and midrange. Discuss any differences among thesevalues. What happens if the group is divided (men andwomen, or people under a certain age and people over acertain age) and these measures of central tendency arecomputed for each of the subgroups? Attempt to use mea-sures of central tendency to discover something interestingabout the entire group or the subgroups.

79. A recent book on spotting bad statistics and learning tothink critically about these influential numbers is DamnLies and Statistics by Joel Best (University of CaliforniaPress, 2001). This activity is designed for six people. Eachperson should select one chapter from Best’s book. Thegroup report should include examples of the use, misuse,and abuse of statistical information. Explain exactly howand why bad statistics emerge, spread, and come to shapepolicy debates. What specific ways does Best recommendto detect bad statistics?

64. An advertisement for a speed-reading course claimed thatthe “average” reading speed for people completing thecourse was 1000 words per minute. Shown below are theactual data for the reading speeds per minute for a sampleof 24 people who completed the course.

1000 900 800 1000 900 850

650 1000 1050 800 1000 850

700 750 800 850 900 950

600 1100 950 700 750 650

a. Find the mean, median, mode, and midrange. (If you pre-fer, first organize the data in a frequency distribution.)

b. Which measure of central tendency was given in theadvertisement?

c. Which measure of central tendency is the best indicatorof the “average” reading speed in this situation?Explain your answer.

65. In one common system for finding a grade-pointaverage, or GPA,

The GPA is calculated by multiplying the number of credithours for a course and the number assigned to each grade,and then adding these products. Then divide this sum bythe total number of credit hours. Because each coursegrade is weighted according to the number of credits of thecourse, GPA is called a weighted mean. Calculate the GPAfor this transcript:

Sociology: 3 cr. A; Biology: 3.5 cr. C; Music: 1 cr. B;Math: 4 cr. B; English: 3 cr. C.

• Writing in Mathematics66. What is the mean and how is it obtained?

67. What is the median and how is it obtained?

68. What is the mode and how is it obtained?

69. What is the midrange and how is it obtained?

70. The “average” income in the United States can be given bythe mean or the median.

a. Which measure would be used in anti-U.S. propaganda?Explain your answer.

b. Which measure would be used in pro-U.S. propaganda?Explain your answer.

71. In a class of 40 students, 21 have examination scores of77%. Which measure or measures of central tendency canyou immediately determine? Explain your answer.

72. You read an article that states, “Of the 411 players in theNational Basketball Association, only 138 make more thanthe average salary of $3.12 million.” Is $3.12 million themean or the median salary? Explain, your answer.

A = 4, B = 3, C = 2, D = 1, F = 0.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 704

Page 27: Ch12 Statistics

SECTION 12.3 Measures of Dispersion 705

SECTION 12.3 • MEASURES OF DISPERSIONO B J E C T I V E S

1. Determine the range for adata set.

2. Determine the standarddeviation for a data set.

When you think of Houston, Texas and Honolulu, Hawaii, do balmy temperaturescome to mind? Both cities have a mean temperature of 75°. However, the meantemperature does not tell the whole story. The temperature in Houston differsseasonally from a low of about 40° in January to a high of close to 100° in July andAugust. By contrast, Honolulu’s temperature varies less throughout the year,usually ranging between 60° and 90°.

Measures of dispersion are used to describe the spread of data items in a dataset. Two of the most common measures of dispersion, the range and the standarddeviation, are discussed in this section.

The RangeA quick but rough measure of dispersion is the range, the difference between the highest and lowest data values in a data set. For example, if Houston’s hottestannual temperature is 103° and its coldest annual temperature is 33°, the range intemperature is

If Honolulu’s hottest day is 89° and its coldest day 61°, the range in temperature is

T H E R A N G EThe range, the difference between the highest and lowest data values in a dataset, indicates the total spread of the data.

E X A M P L E 1 CO M P U T I N G T H E R A N G E

Figure 12.7 shows the number of workers, in millions, for the five countries with thelargest labor forces. Find the range of workers, in millions, for these five countries.

S O L U T I O N

The range is 696 million workers.

Find the range for the following group of data items:

4, 2, 11, 7.

CH

EC

K POINT

1

= 778 - 82 = 696 Range = highest data value - lowest data value

Range = highest data value - lowest data value

89° - 61°, or 28°.

103° - 33°, or 70°.

1 Determine the range for a data set.

Num

ber

of W

orke

rs(m

illio

ns)

Countries with theMost Workers

Country

900

700

500

100

300

Chi

na

778

Indi

a

472

U.S

.

147

Indo

nesi

a

106

Bra

zil

82

FIGURE 12.7Source: Central Intelligence Agency

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 705

Page 28: Ch12 Statistics

Num

ber

of W

orke

rs(m

illio

ns)

Countries with theMost Workers

Country

900

700

500

100

300

Chi

na

778

Indi

a

472

U.S

.

147

Indo

nesi

a

106

Bra

zil

82

FIGURE 12.7 (repeated)Source: Central Intelligence Agency

706 CHAPTER 12 Statistics

T A B L E 1 2 . 1 0 D E V I AT I O N S F R O M

T H E M E A N

Data Deviation:item

778

472

147

106

82 82 - 317 = -235

106 - 317 = -211

147 - 317 = -170

472 - 317 = 155

778 - 317 = 461

data item � mean

The Standard DeviationA second measure of dispersion, and one that is dependent on all of the data items,is called the standard deviation. The standard deviation is found by determininghow much each data item differs from the mean.

In order to compute the standard deviation, it is necessary to find by how mucheach data item deviates from the mean. First compute the mean. Then subtract themean from each data item. Example 2 shows how this is done. In Example 3, we willuse this skill to actually find the standard deviation.

E X A M P L E 2P R E PA R I N G TO F I N D T H E S TA N DA R DD E V I AT I O N ; F I N D I N G D E V I AT I O N S F R O MT H E M E A N

Find the deviations from the mean for the five data items 778, 472, 147, 106, and 82,shown in Figure 12.7.

S O L U T I O N First, calculate the mean.

The mean for the five countries with the largest labor forces is 317 million workers.Now, let’s find by how much each of the five data items in Figure 12.7 differs from317, the mean. For China, with 778 million workers, the computation is shown as fol-lows:

This indicates that the labor force in China exceeds the mean by 461 million workers.The computation for the United States, with 147 million workers, is given by

This indicates that the labor force in the United States is 170 million workersbelow the mean.

The deviations from the mean for each of the five given data items are shown inTable 12.10.

Compute the mean for the following group of data items:

2, 4, 7, 11.

Then find the deviations from the mean for the four data items. Organize your workin table form just like Table 12.10. Keep track of these computations. You will beusing them in Check Point 3.

The sum of the deviations for a set of data is always zero. For the deviationsshown in Table 12.10,

This shows that we cannot find a measure of dispersion by finding the mean of thedeviations, because this value is always zero. However, a kind of average of thedeviations from the mean, called the standard deviation, can be computed.We do soby squaring each deviation and later introducing a square root in the computation.Here are the details on how to find the standard deviation for a set of data:

461 + 155 + 1-1702 + 1-2112 + 1-2352 = 616 + 1-6162 = 0.

CH

EC

K POINT

2

= 147 - 317 = -170.

Deviation from mean = data item - mean

= 778 - 317 = 461.

Deviation from mean = data item - mean

Mean =©xn

=778 + 472 + 147 + 106 + 82

5=

15855

= 317

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 706

Page 29: Ch12 Statistics

2 Determine the standard deviationfor a data set.

SECTION 12.3 Measures of Dispersion 707

COMPUTING THE STANDARD DEVIATION FOR A DATA SET

1. Find the mean of the data items.

2. Find the deviation of each data item from the mean:

3. Square each deviation:

4. Sum the squared deviations:

5. Divide the sum in step 4 by where n represents the number of data items:

6. Take the square root of the quotient in step 5. This value is the standarddeviation for the data set.

The computation of the standard deviation can be organized using a table withthree columns:

Deviation:Data item

In Example 2, we worked out the first two columns of such a table. Let’s continueworking with the data for the countries with the most workers and compute thestandard deviation.

E X A M P L E 3 CO M P U T I N G T H E S TA N DA R D D E V I AT I O N

Figure 12.7, showing the number of workers, in millions, for the five countries withthe largest labor forces, appears in the margin on page 706. Find the standard devia-tion, in millions, for these five countries.

S O L U T I O N

Step 1. Find the mean. From our work in Example 2, the mean is 317.

Step 2. Find the deviation of each data item from the mean:This, too, was done in Example 2 for each of the five data items.

Step 3. Square each deviation: We square each of thenumbers in the column, shown in Table 12.11. Notice thatsquaring the difference always results in a positive number.

1data item - mean21data item � mean22.

data item � mean.

1Data item � mean22Data item � mean1Deviation22:

Standard deviation = C©1data item - mean22

n - 1

©1data item - mean22

n - 1.

n - 1,©1data item - mean22.

1data item - mean22.

data item - mean.

T A B L E 1 2 . 1 1 CO M P U T I N G T H E S TA N DA R D D E V I AT I O N

Data item Deviation:

778

472

147

106

82 1-23522 = 1-23521-2352 = 55,22582 - 317 = -235

1-21122 = 1-21121-2112 = 44,521106 - 317 = -211

1-17022 = 1-17021-1702 = 28,900147 - 317 = -170

1552 = 155 # 155 = 24,025472 - 317 = 155

4612 = 461 # 461 = 212,521778 - 317 = 461

1Deviation22: 1data item � mean22data item � mean

Totals: 0 365,192

Adding the five numbers in the third columngives the sum of the squared deviations:

�(data item − mean)2.

The sum of the deviations for a set ofdata is always zero.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 707

Page 30: Ch12 Statistics

708 CHAPTER 12 Statistics

T E C H N O L O G Y

Almost all scientific and graphingcalculators compute the standarddeviation of a set of data. Usingthe data items in Example 3,

778, 472, 147, 106, 82,

the keystrokes for obtainingthe standard deviation on manyscientific calculators are asfollows:

Graphing calculators require thatyou specify if data items are froman entire population or a sampleof the population.

106 � © + � 82 � © + � � 2nd � � sn - 1 � .

778 � © + � 472 � © + � 147 � © + �

Step 4. Sum the squared deviations: This step is shown inTable 12.11 on the previous page. The squares in the third column were added,resulting in a sum of 365,192.

Step 5. Divide the sum in step 4 by where n represents the number of dataitems. The number of data items is 5 so we divide by 4.

Step 6. The standard deviation is the square root of the quotient in step 5.

The standard deviation for the five countries with the largest labor forces isapproximately 302.16 million workers.

Find the standard deviation for the group of data items in Check Point 2 onpage 706. Round to two decimal places.

Example 4 illustrates that as the spread of data items increases, the standarddeviation gets larger.

E X A M P L E 4 CO M P U T I N G T H E S TA N DA R D D E V I AT I O N

Find the standard deviation of the data items in each of the samples shown below.

Sample A Sample B

17, 18, 19, 20, 21, 22, 23 5, 10, 15, 20, 25, 30, 35

S O L U T I O N Begin by finding the mean for each sample.

Sample A:

Sample B:

Although both samples have the same mean, the data items in sample B aremore spread out. Thus, we would expect sample B to have the greater standarddeviation. The computation of the standard deviation requires that we find

shown in Table 12.12.©1data item - mean22,

Mean =5 + 10 + 15 + 20 + 25 + 30 + 35

7=

1407

= 20

Mean =17 + 18 + 19 + 20 + 21 + 22 + 23

7=

1407

= 20

CH

EC

K POINT

3

Standard deviation = C©1data item - mean22

n - 1= 291,298 L 302.16

©1data item - mean22

n - 1=

365,1925 - 1

=365,192

4= 91,298

n � 1,

π1data item � mean22.

T A B L E 1 2 . 1 2 CO M P U T I N G S TA N DA R D D E V I AT I O N S F O R T W O S A M P L E S

Sample A Sample B

Deviation: Deviation:Data item Data item

17 5

18 10

19 15

20 20

21 25

22 30

23 35 152 = 22535 - 20 = 1532 = 923 - 20 = 3

102 = 10030 - 20 = 1022 = 422 - 20 = 2

52 = 2525 - 20 = 512 = 121 - 20 = 1

02 = 020 - 20 = 002 = 020 - 20 = 0

1-522 = 2515 - 20 = -51-122 = 119 - 20 = -1

1-1022 = 10010 - 20 = -101-222 = 418 - 20 = -2

1-1522 = 2255 - 20 = -151-322 = 917 - 20 = -3

1data item � mean22data item � mean1data item � mean22data item � mean1Deviation22:1Deviation22:

Totals: ©1data item - mean22 = 700©1data item - mean22 = 28

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 708

Page 31: Ch12 Statistics

SECTION 12.3 Measures of Dispersion 709

1

1 3 5 72 4

Freq

uenc

y

(a)

Standard deviation = 0 Standard deviation = 3Standard deviation = 1Standard deviation L 0.8

6 1 3 5 72 4

(b)

6 1 3 5 72 4

(c)

6 1 3 5 72 4

(d)

6

2

0

3

4

5

6

7

FIGURE 12.8 The standard deviation gets larger with increased dispersion among data items.In each case, the mean is 4.

Each sample contains seven data items, so we compute the standard deviation bydividing the sums in Table 12.12, 28 and 700, by or 6.Then we take the squareroot of each quotient.

Sample A: Sample B:

Sample A has a standard deviation of approximately 2.16 and sample B has astandard deviation of approximately 10.80. The scores in sample B are more spreadout than those in sample A.

Find the standard deviation of the data items in each of the samples shownbelow. Round to two decimal places.

Sample A: 73, 75, 77, 79, 81, 83

Sample B: 40, 44, 92, 94, 98, 100

Figure 12.8 illustrates four sets of data items organized in histograms. From leftto right, the data items are

Figure 12.8(a): 4, 4, 4, 4, 4, 4, 4

Figure 12.8(b): 3, 3, 4, 4, 4, 5, 5

Figure 12.8(c): 3, 3, 3, 4, 5, 5, 5

Figure 12.8(d): 1, 1, 1, 4, 7, 7, 7.

Each data set has a mean of 4. However, as the spread of data items increases, thestandard deviation gets larger. Observe that when all the data items are the same,the standard deviation is 0.

CH

EC

K POINT

4

Standard deviation = A7006

L 10.80Standard deviation = A286

L 2.16

Standard deviation = C©1data item - mean22

n - 1

7 - 1,

E X A M P L E 5 I N T E R P R E T I N G S TA N DA R D D E V I AT I O N

Two fifth-grade classes have nearly identical mean scores on an aptitude test, butone class has a standard deviation three times that of the other. All other factorsbeing equal, which class is easier to teach, and why?

S O L U T I O N The class with the smaller standard deviation is easier to teachbecause there is less variation among student aptitudes. Course work can be aimedat the average student without too much concern that the work will be too easy forsome or too difficult for others. By contrast, the class with greater dispersion poses agreater challenge. By teaching to the average student, the students whose scores are

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 709

Page 32: Ch12 Statistics

710 CHAPTER 12 Statistics

significantly above the mean will be bored; students whose scores are significantlybelow the mean will be confused.

Shown below are the means and standard deviations of the yearly returns ontwo investments from 1926 through 2004.

Investment Mean Yearly Interest Standard Deviation

Small-Company Stocks 17.5% 33.3%

Large-Company Stocks 12.4% 20.4%

Source: Summary Statistics of Annual Total Returns 1926 to 2004 Yearbook, Ibbotson Associates, Chicago

a. Use the means to determine which investment provided the greateryearly return.

b. Use the standard deviations to determine which investment had thegreater risk. Explain your answer.

CH

EC

K POINT

5

E X E R C I S E S E T 1 2 . 3

28. Sample A: 8, 10, 12, 14, 16, 18, 20

Sample B: 8, 9, 10, 14, 18, 19, 20

Sample C: 8, 8, 8, 14, 20, 20, 20

• Practice Plus

In Exercises 29–36, use each display of data items to find thestandard deviation. Where necessary, round answers to twodecimal places.

29.

30.

Score

6

Freq

uenc

y

4

3

2

1

5

6

7

7 8 9 10 11 12

Score

6

Freq

uenc

y

4

3

2

1

5

6

7

7 8 9 10 11 12

• Practice ExercisesIn Exercises 1–6, find the range for each group of data items.

1. 1, 2, 3, 4, 5 2. 16, 17, 18, 19, 20

3. 7, 9, 9, 15 4. 11, 13, 14, 15, 17

5. 3, 3, 4, 4, 5, 5 6. 3, 3, 3, 4, 5, 5, 5

In Exercises 7–10, a group of data items and their mean are given.

a. Find the deviation from the mean for each of the data items.

b. Find the sum of the deviations in part (a).

7. 3, 5, 7, 12, 18, 27;

8. 84, 88, 90, 95, 98;

9. 29, 38, 48, 49, 53, 77;

10. 60, 60, 62, 65, 65, 65, 66, 67, 70, 70;

In Exercises 11–16, find a. the mean;b. the deviation from the meanfor each data item; and c. the sum of the deviations in part (b).

11. 85, 95, 90, 85, 100 12. 94, 62, 88, 85, 91

13. 146, 153, 155, 160, 161 14. 150, 132, 144, 122

15. 2.25, 3.50, 2.75, 3.10, 1.90 16. 0.35, 0.37, 0.41, 0.39, 0.43

In Exercises 17–26, find the standard deviation for each groupof data items. Round answers to two decimal places.

17. 1, 2, 3, 4, 5 18. 16, 17, 18, 19, 20

19. 7, 9, 9, 15 20. 11, 13, 14, 15, 17

21. 3, 3, 4, 4, 5, 5 22. 3, 3, 3, 4, 5, 5, 5

23. 1, 1, 1, 4, 7, 7, 7 24. 6, 6, 6, 6, 7, 7, 7, 4, 8, 3

25. 9, 5, 9, 5, 9, 5, 9, 5 26. 6, 10, 6, 10, 6, 10, 6, 10

In Exercises 27–28, compute the mean, range, and standarddeviation for the data items in each of the three samples. Thendescribe one way in which the samples are alike and one way inwhich they are different.

27. Sample A: 6, 8, 10, 12, 14, 16, 18

Sample B: 6, 7, 8, 12, 16, 17, 18

Sample C: 6, 6, 6, 12, 18, 18, 18

Mean = 65

Mean = 49

Mean = 91

Mean = 12

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 710

Page 33: Ch12 Statistics

SECTION 12.3 Measures of Dispersion 711

31.

32.

Score

6

Freq

uenc

y

4

3

2

1

5

6

7

7 8 9 10 11 12

Score

6

Freq

uenc

y

4

3

2

1

5

6

7

7 8 9 10 11 12

Year Best Actress Age

1999 Hilary Swank 25

2000 Julia Roberts 33

2001 Halle Berry 35

2002 Nicole Kidman 35

2003 Charlize Theron 28

2004 Hilary Swank 30

2005 Reese Witherspoon 29

Source: www.oscars.org

a. Without calculating, which data set has the greatermean age? Explain your answer.

b. Verify your conjecture from part (a) by calculating themean age for each data set. Round answers to twodecimal places.

c. Without calculating, which data set has the greaterstandard deviation? Explain your answer.

d. Verify your conjecture from part (c) by calculating thestandard deviation for each data set. Round answers totwo decimal places.

38. The data sets give the ages of the first six U.S. presidentsand the last six U.S. presidents (through G. W. Bush).

33. Stems Leaves

0 5

1 0 5

2 0 5

34. Stems Leaves

0 4 8

1 2 6

2 0

35. Stems Leaves

1 8 9 9 8 7 8

2 0 1 0 2

36. Stems Leaves

1 3 5 3 8 3 4

2 3 0 0 4

• Application Exercises37. The data sets give the ages of Oscar winners from 1999

through 2005 at the time of the award.

Year Best Actor Age

1999 Kevin Spacey 40

2000 Russell Crowe 36

2001 Denzel Washington 47

2002 Adrien Brody 29

2003 Sean Penn 43

2004 Jamie Foxx 37

2005 Philip Seymour Hoffman 38

AGE OF FIRST SIX U.S. PRESIDENTS AT INAUGURATION

President Age

Washington 57

J. Adams 61

Jefferson 57

Madison 57

Monroe 58

J. Q. Adams 57

AGE OF LAST SIX U.S. PRESIDENTS AT INAUGURATION

President Age

Ford 61

Carter 52

Reagan 69

G. H. W. Bush 64

Clinton 46

G. W. Bush 54

Source: Time Almanac

a. Without calculating, which set has the greater standarddeviation? Explain your answer.

b. Verify your conjecture from part (b) by calculating thestandard deviation for each data set. Round answers totwo decimal places.

• Writing in Mathematics39. Describe how to find the range of a data set.

40. Describe why the range might not be the best measure ofdispersion.

41. Describe how the standard deviation is computed.

42. Describe what the standard deviation reveals about a dataset.

43. If a set of test scores has a standard deviation of zero, whatdoes this mean about the scores?

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 711

Page 34: Ch12 Statistics

712 CHAPTER 12 Statistics

44. Two classes took a statistics test. Both classes had a meanscore of 73. The scores of class A had a standard deviationof 5 and those of class B had a standard deviation of 10.Discuss the difference between the two classes’ perfor-mance on the test.

45. A sample of cereals indicates a mean potassium contentper serving of 93 milligrams and a standard deviation of2 milligrams. Write a description of what this means for aperson who knows nothing about statistics.

46. Over a one-month period, stock A had a mean dailyclosing price of 124.7 and a standard deviation of 12.5. Bycontrast, stock B had a mean daily closing price of 78.2 anda standard deviation of 6.1. Which stock was morevolatile? Explain your answer.

• Critical Thinking Exercises47. Which one of the following is true?

a. If the same number is added to each data item in a set ofdata, the standard deviation does not change.

b. If each number in a data set is multiplied by 4, thestandard deviation is doubled.

c. It is possible for a set of scores to have a negativestandard deviation.

d. Data sets with different means cannot have the samestandard deviation.

48. Describe a situation in which a relatively large standarddeviation is desirable.

49. If a set of test scores has a large range but a small standarddeviation, describe what this means about students’performance on the test.

50. Use the data 1, 2, 3, 5, 6, 7. Without actually computing thestandard deviation, which of the following best approxi-mates the standard deviation?

a. 2 b. 6 c. 10 d. 20

51. Use the data 0, 1, 3, 4, 4, 6. Add 2 to each of the numbers.How does this affect the mean? How does this affect thestandard deviation?

• Group Exercises52. As a follow-up to Group Exercise 78 on page 704, the

group should reassemble and compute the standard devia-tion for each data set whose mean you previously deter-mined. Does the standard deviation tell you anything newor interesting about the entire group or subgroups that youdid not discover during the previous group activity?

53. Group members should consult a current almanac or theInternet and select intriguing data. The group’s function isto use statistics to tell a story. Once “intriguing” data areidentified, as a group

a. Summarize the data. Use words, frequency distributions,and graphic displays.

b. Compute measures of central tendency and dispersion,using these statistics to discuss the data.

O B J E C T I V E S

1. Recognize characteristics ofnormal distributions.

2. Understand the 68–95–99.7Rule.

3. Find scores at a specifiedstandard deviation from themean.

4. Use the 68–95–99.7 Rule.

5. Convert a data item to a z-score.

6. Understand percentiles andquartiles.

7. Solve applied problemsinvolving normal distributions.

8. Use and interpret margins oferror.

9. Recognize distributions thatare not normal.

Our heights are on the rise! In one million B.C., the mean height for men was 4 feet6 inches. The mean height for women was 4 feet 2 inches. Because of improved dietsand medical care, the mean height for men is now 5 feet 10 inches and for women itis 5 feet 5 inches. Mean adult heights are expected to plateau by 2050.

Suppose that a researcher selects a random sample of 100 adult men, measurestheir heights, and constructs a histogram. The graph is shown in Figure 12.9 at thetop of the next page. Figure 12.9 illustrates what happens as the sample sizeincreases. In Figure 12.9(c), if you were to fold the graph down the middle, the leftside would fit the right side.As we move out from the middle, the heights of the barsare the same to the left and right. Such a histogram is called symmetric. As thesample size increases, so does the graph’s symmetry. If it were possible to measure

5'2" 5'7" 5'5" 5'7"5'10" 6'0"

1900

Mean Adult Heights

2000 2050

Source: National Center for Health Statistics

SECTION 12.4 • THE NORMAL DISTRIBUTION

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 712

Page 35: Ch12 Statistics

SECTION 12.4 The Normal Distribution 713

(a)

Random Sample of 100 Men

Height

(c)

Sample Size Continues to Increase

Height

Num

ber

of M

enN

umbe

r of

Men

(b)

Sample Size Increases

Height

(d)

Normal Distribution for the Population

Height

Num

ber

of M

enN

umbe

r of

Men

FIGURE 12.9 Heights of adult males

Figure 12.9(d) illustrates that the normal distribution is bell shaped and symmet-ric about a vertical line through its center. Furthermore, the mean, median, and modeof a normal distribution are all equal and located at the center of the distribution.

The shape of the normal distribution depends on the mean and the standarddeviation. Figure 12.10 illustrates three normal distributions with the same mean,but different standard deviations. As the standard deviation increases, the distribu-tion becomes more dispersed, or spread out, but retains its symmetric bell shape.

1 Recognize characteristics ofnormal distributions.

Mean Mean MeanFIGURE 12.10

The normal distribution provides a wonderful model for all kinds of phenomenabecause many sets of data items closely resemble this population distribution.Examples include heights and weights of adult males, intelligence quotients,SAT scores, prices paid for a new car model, and life spans of light bulbs. In thesedistributions, the data items tend to cluster around the mean. The more an itemdiffers from the mean, the less likely it is to occur.

The normal distribution is used to make predictions about an entire populationusing data from a sample. In this section, we focus on the characteristics andapplications of the normal distribution.

The Standard Deviation and z-Scores in Normal DistributionsThe standard deviation plays a crucial role in the normal distribution, summarizedby the 68–95–99.7 Rule. This rule is illustrated in Figure 12.11 on the next page.

the heights of all adult males, the entire population, the histogram would approachwhat is called the normal distribution, shown in Figure 12.9(d). This distribution isalso called the bell curve or the Gaussian distribution, named for the German math-ematician Carl Friedrich Gauss (1777–1855).

2 Understand the 68–95–99.7Rule.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 713

Page 36: Ch12 Statistics

FIGURE 12.11

714 CHAPTER 12 Statistics

THE 68–95–99.7 RULE FOR THE NORMAL DISTRIBUTION

1. Approximately 68% of the data items fall within 1 standard deviation of themean (in both directions).

2. Approximately 95% of the data items fall within 2 standard deviations of themean.

3. Approximately 99.7% of the data items fall within 3 standard deviationsof the mean.

1 2 30−1−2−3

Standard deviations

above mean

Standard deviations

below mean

Freq

uenc

yN

umbe

r of

Per

sons

or T

hing

s

68%

99.7%

95%

MeanData Items

BLIT

ZER

B NUS

Well-Worn Stepsand the NormalDistribution

These ancient steps each take onthe shape of a normal distribu-tion when the picture is viewedupside down. The center of eachstep is more worn than the outeredges. The greatest number ofpeople have walked in thecenter, making this the mean,median, and mode for wherepeople have walked.

Figure 12.11 illustrates that a very small percentage of the data in a normaldistribution lies more than 3 standard deviations above or below the mean. As wemove from the mean, the curve falls rapidly, and then more and more gradually,toward the horizontal axis. The tails of the curve approach, but never touch, thehorizontal axis, although they are quite close to the axis at 3 standard deviationsfrom the mean. The range of the normal distribution is infinite. No matter how farout from the mean we move, there is always the probability (although very small) ofa data item occurring even farther out.

E X A M P L E 1 F I N D I N G S CO R E S AT A S P E C I F I E D S TA N DA R DD E V I AT I O N F R O M T H E M E A N

Male adult heights in North America are approximately normally distributed with amean of 70 inches and a standard deviation of 4 inches. Find the height that is

a. 2 standard deviations above the mean.

b. 3 standard deviations below the mean.

S O L U T I O N

a. First, let us find the height that is 2 standard deviations above the mean.

A height of 78 inches is 2 standard deviations above the mean.

b. Next, let us find the height that is 3 standard deviations below the mean.

A height of 58 inches is 3 standard deviations below the mean.

= 70 - 3 # 4 = 70 - 12 = 58

Height = mean - 3 # standard deviation

= 70 + 2 # 4 = 70 + 8 = 78

Height = mean + 2 # standard deviation

3 Find scores at a specifiedstandard deviation from the mean.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 714

Page 37: Ch12 Statistics

SECTION 12.4 The Normal Distribution 715

74 78 82

Normal Distribution of Male Adult Heights

7062 6658

Num

ber

of M

ale

Adu

lts 68%

99.7%

95%

MeanMale Adult Heights in North America FIGURE 12.12

Female adult heights in North America are approximately normallydistributed with a mean of 65 inches and a standard deviation of 3.5 inches.

Find the height that is

a. 3 standard deviations above the mean.

b. 2 standard deviations below the mean.

E X A M P L E 2 U S I N G T H E 6 8 – 9 5 – 9 9. 7 R U L E

Use the distribution of male adult heights in Figure 12.12 to find the percentage ofmen in North America with heights

a. between 66 inches and 74 inches. b. between 70 inches and 74 inches.c. above 78 inches.

S O L U T I O N

a. The 68–95–99.7 Rule states that approximately 68% of the data items fall within1 standard deviation, 4, of the mean, 70.

Figure 12.12 shows that 68% of male adults have heights between 66 inches and74 inches.

b. The percentage of men withheights between 70 inches and74 inches is not given directly inFigure 12.12 or Figure 12.13.Because of the distribution’s sym-metry, the percentage with heightsbetween 66 inches and 70 inches isthe same as the percentage withheights between 70 and 74 inches.Figure 12.13 indicates that 68%have heights between 66 inchesand 74 inches. Thus, half of 68%,or 34%, of men have heightsbetween 70 inches and 74 inches.

mean + 1 # standard deviation = 70 + 1 # 4 = 70 + 4 = 74

mean - 1 # standard deviation = 70 - 1 # 4 = 70 - 4 = 66

CH

EC

K POINT

1

747066

?%

68%

FIGURE 12.13 What percentage have heightsbetween 70 inches and 74 inches?

4 Use the 68–95–99.7 Rule.

The distribution of male adult heights in North America is illustrated as anormal distribution in Figure 12.12.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 715

Page 38: Ch12 Statistics

5 Convert a data item to a z-score.

95%?%

62 7870

2 standarddeviationsbelow mean

Mean 2 standarddeviationsabove mean FIGURE 12.14 What percentage have

heights above 78 inches?

716 CHAPTER 12 Statistics

c. The percentage of men with heights above 78 inches is not given directly inFigure 12.12 or Figure 12.14. A height of 78 inches is 2 standard deviations,or 8 inches, above the mean, 70 inches. The 68–95–99.7 Rule states thatapproximately 95% of the data items fall within 2 standard deviations of themean. Thus, approximately or 5%, of the data items are fartherthan 2 standard deviations from the mean. The 5% of the data items arerepresented by the two shaded green regions in Figure 12.14. Because of thedistribution’s symmetry, half of 5%, or 2.5%, of the data items are more than 2standard deviations above the mean. This means that 2.5% of men have heightsabove 78 inches.

100% - 95%,

2 # 4,

Use the distribution of male adult heights in North America in Figure 12.12on page 715 to find the percentage of men with heights

a. between 62 inches and 78 inches.b. between 70 inches and 78 inches.c. above 74 inches.

Because the normal distribution of male adult heights in North America has amean of 70 inches and a standard deviation of 4 inches, a height of 78 inches lies 2standard deviations above the mean. In a normal distribution, a z-score describeshow many standard deviations a particular data item lies above or below the mean.Thus, the z-score for the data item 78 is 2.

The following formula can be used to express a data item in a normaldistribution as a z-score:

C O M P U T I N G z - S C O R E SA z-score describes how many standard deviations a data item in a normaldistribution lies above or below the mean. The z-score can be obtained using

Data items above the mean have positive z-scores. Data items below the meanhave negative z-scores. The z-score for the mean is 0.

E X A M P L E 3 CO M P U T I N G z - S CO R E S

The mean weight of newborn infants is 7 pounds and the standard deviation is 0.8pound. The weights of newborn infants are normally distributed. Find the z-scorefor a weight of

a. 9 pounds. b. 7 pounds. c. 6 pounds.

z-score =data item - meanstandard deviation

.

CH

EC

K POINT

2

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 716

Page 39: Ch12 Statistics

SECTION 12.4 The Normal Distribution 717

S O L U T I O N We compute the z-score for each weight by using the z-score formula.The mean is 7 and the standard deviation is 0.8.

a. The z-score for a weight of 9 pounds, written is

The z-score of a data item greater than the mean is always positive. A 9-poundinfant is a chubby little tyke, with a weight that is 2.5 standard deviations above themean.

b. The z-score for a weight of 7 pounds is

The z-score for the mean is always 0. A 7-pound infant is right at the mean,deviating 0 pounds above or below it.

c. The z-score for a weight of 6 pounds is

The z-score of a data item less than the mean is always negative. A 6-poundinfant’s weight is 1.25 standard deviations below the mean.

Figure 12.15 shows the normal distribution of weights of newborn infants. Thehorizontal axis is labeled in terms of weights and z-scores.

z6 =data item - meanstandard deviation

=6 - 7

0.8=

-10.8

= -1.25.

z7 =data item - meanstandard deviation

=7 - 7

0.8=

00.8

= 0.

z9 =data item - meanstandard deviation

=9 - 7

0.8=

20.8

= 2.5.

z9 ,

The length of horse pregnancies from conception to birth is normallydistributed with a mean of 336 days and a standard deviation of 3 days. Find

the z-score for a horse pregnancy of

a. 342 days. b. 336 days. c. 333 days.

In Example 4, we consider two normally distributed sets of test scores, in whicha higher score generally indicates a better result.To compare scores on two differenttests in relation to the mean on each test, we can use z-scores.The better score is theitem with the greater z-score.

CH

EC

K POINT

3

1 2 z-scores

Weights

30−1−2−3

7.87 8.6 9.44.6 5.4 6.2

A 6-pound infant is 1.25 standarddeviations below the mean.

A 9-pound infant is 2.5 standarddeviations above the mean.

Normal Distribution of Weights of Newborn Infants

MeanFIGURE 12.15 Infants’ weights arenormally distributed.

BLITMC12_679-746-v3 10/26/06 4:20 PM Page 717

Page 40: Ch12 Statistics

BLIT

ZER

B NUS

The IQ ControversyIs intelligence something we areborn with or is it a quality thatcan be manipulated througheducation? Can it be measuredaccurately and is IQ the way tomeasure it? There are no clearanswers to these questions.

In a study by Carolyn Bird(Pygmalion in the Classroom), agroup of third-grade teacherswas told that they had classes ofstudents with IQs well abovethe mean. These classes madeincredible progress throughoutthe year. In reality, these werenot gifted kids, but, rather, arandom sample of all third-graders. It was the teachers’expectations, and not the IQs ofthe students, that resulted inincreased performance.

718 CHAPTER 12 Statistics

E X A M P L E 4 U S I N G A N D I N T E R P R E T I N G z - S CO R E S

A student scores 70 on an arithmetic test and 66 on a vocabulary test. The scores forboth tests are normally distributed. The arithmetic test has a mean of 60 and a stan-dard deviation of 20. The vocabulary test has a mean of 60 and a standard deviationof 2. On which test did the student have the better score?

S O L U T I O N To answer the question, we need to find the student’s z-score on eachtest, using

The arithmetic test has a mean of 60 and a standard deviation of 20.

The vocabulary test has a mean of 60 and a standard deviation of 2.

The arithmetic score, 70, is half a standard deviation above the mean, whereas thevocabulary score, 66, is 3 standard deviations above the mean.The student did muchbetter than the mean on the vocabulary test.

The SAT (Scholastic Aptitude Test) has a mean of 500 and a standarddeviation of 100. The ACT (American College Test) has a mean of 18 and a

standard deviation of 6. Both tests measure the same kind of ability, with scores thatare normally distributed. Suppose that you score 550 on the SAT and 24 on theACT. On which test did you have the better score?

E X A M P L E 5 U N D E R S TA N D I N G z - S CO R E S

Intelligence quotients (IQs) on the Stanford-Binet intelligence test are normallydistributed with a mean of 100 and a standard deviation of 16.

a. What is the IQ corresponding to a z-score of

b. Mensa is a group of people with high IQs whose members have z-scores of 2.05or greater on the Stanford-Binet intelligence test.What is the IQ correspondingto a z-score of 2.05?

S O L U T I O N

a. We begin with the IQ corresponding to a z-score of The negative sign intells us that the IQ is standard deviations below the mean.

The IQ corresponding to a z-score of is 76.

b. Next, we find the IQ corresponding to a z-score of 2.05. The positive sign impliedin 2.05 tells us that the IQ is 2.05 standard deviations above the mean.

The IQ corresponding to a z-score of 2.05 is 132.8. (An IQ score of at least 133is required to join Mensa.)

Use the information in Example 5 to find the IQ corresponding to a z-score of

a. b. 1.75.-2.25.

CH

EC

K POINT

5

= 100 + 2.051162 = 100 + 32.8 = 132.8

IQ = mean + 2.05 # standard deviation

-1.5

= 100 - 1.51162 = 100 - 24 = 76

IQ = mean - 1.5 # standard deviation

1 12-1.5

-1.5.

-1.5?

CH

EC

K POINT

4

z-score for 66 = z66 =66 - 60

2=

62

= 3

z-score for 70 = z70 =70 - 60

20=

1020

= 0.5

z =data item - meanstandard deviation

.

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 718

Page 41: Ch12 Statistics

7 Solve applied problems involvingnormal distributions.

S T U D Y T I P

A score in the 98th percentile doesnot mean that 98% of the answersare correct. Nor does it mean thatthe score was 98%.

6 Understand percentiles andquartiles.

SECTION 12.4 The Normal Distribution 719

Percentiles and QuartilesA z-score measures a data item’s position in a normal distribution. Anothermeasure of a data item’s position is its percentile. Percentiles are often associatedwith scores on standardized tests. If a score is in the 45th percentile, this means that45% of the scores are less than this score. If a score is in the 95th percentile, thisindicates that 95% of the scores are less than this score.

P E R C E N T I L E SIf n% of the items in a distribution are less than a particular data item, we saythat the data item is in the nth percentile of the distribution.

E X A M P L E 6 I N T E R P R E T I N G P E R C E N T I L E

The cutoff IQ score for Mensa membership, 132.8, is in the 98th percentile.What does this mean?

S O L U T I O N Because 132.8 is in the 98th percentile, this means that 98% of IQscores fall below 132.8.

A student scored in the 75th percentile on the SAT. What does this mean?

Three commonly encountered percentiles are the quartiles. Quartiles dividedata sets into four equal parts. The 25th percentile is the first quartile: 25% of thedata fall below the first quartile. The 50th percentile is the second quartile: 50% ofthe data fall below the second quartile, so the second quartile is equivalent to themedian.The 75th percentile is the third quartile: 75% of the data fall below the thirdquartile. Figure 12.16 illustrates the concept of quartiles for the normal distribution.

CH

EC

K POINT

6

Problem Solving Using z-Scores and PercentilesTable 12.13 on the next page gives a percentile interpretation for z-scores. For example,the portion of the table in the margin indicates that the corresponding percentile for adata item with a z-score of 2 is 97.72. A student with this score on a test whose resultsare normally distributed outperformed 97.72% of those who took the test.

In a normal distribution, the mean, median, and mode all have a correspondingz-score of 0. Table 12.13 shows that the percentile for a z-score of 0 is 50.00. Thus,50% of the data items in a normal distribution are less than the mean, median, andmode. Consequently, 50% of the data items are greater than or equal to the mean,median, and mode.

25%of data

25%of data

25%of

data

25%of

data

First quartileor 25th percentile

Second quartile(median)

or 50th percentile

Third quartileor 75th percentile

FIGURE 12.16 Quartiles

Two entries from Table 12.13. Thecomplete table is on the next page.

z-Score Percentile

2.0 97.72

0.0 50.00

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 719

Page 42: Ch12 Statistics

720 CHAPTER 12 Statistics

T A B L E 1 2 . 1 3 z - S CO R E S A N D P E R C E N T I L E S

z-Score Percentile z-Score Percentile z-Score Percentile z-Score Percentile

0.003 15.87 0.0 50.00 1.1 86.43

0.02 17.11 0.05 51.99 1.2 88.49

0.13 18.41 0.10 53.98 1.3 90.32

0.19 19.77 0.15 55.96 1.4 91.92

0.26 21.19 0.20 57.93 1.5 93.32

0.35 22.66 0.25 59.87 1.6 94.52

0.47 24.20 0.30 61.79 1.7 95.54

0.62 25.78 0.35 63.68 1.8 96.41

0.82 27.43 0.40 65.54 1.9 97.13

1.07 29.12 0.45 67.36 2.0 97.72

1.39 30.85 0.50 69.15 2.1 98.21

1.79 32.64 0.55 70.88 2.2 98.61

2.28 34.46 0.60 72.57 2.3 98.93

2.87 36.32 0.65 74.22 2.4 99.18

3.59 38.21 0.70 75.80 2.5 99.38

4.46 40.13 0.75 77.34 2.6 99.53

5.48 42.07 0.80 78.81 2.7 99.65

6.68 44.04 0.85 80.23 2.8 99.74

8.08 46.02 0.90 81.59 2.9 99.81

9.68 48.01 0.95 82.89 3.0 99.87

11.51 0.0 50.00 1.0 84.13 3.5 99.98

13.57 4.0 99.997-1.1

-1.2

-0.05-1.3

-0.10-1.4

-0.15-1.5

-0.20-1.6

-0.25-1.7

-0.30-1.8

-0.35-1.9

-0.40-2.0

-0.45-2.1

-0.50-2.2

-0.55-2.3

-0.60-2.4

-0.65-2.5

-0.70-2.6

-0.75-2.7

-0.80-2.8

-0.85-2.9

-0.90-3.0

-0.95-3.5

-1.0-4.0

Table 12.13 can be used to find the percentage of data items that are less thanany data item in a normal distribution. Begin by converting the data item to a z-score. Then, use the table to find the percentile for this z-score. This percentile isthe percentage of data items that are less than the data item in question.

E X A M P L E 7F I N D I N G T H E P E R C E N TAG E O F DATA I T E M SL E S S T H A N A G I V E N DATA I T E M

According to the Department of Health and Education, cholesterol levels arenormally distributed. For men between 18 and 24 years, the mean is 178.1 (measuredin milligrams per 100 milliliters) and the standard deviation is 40.7.What percentageof men in this age range have a cholesterol level less than 239.15?

S O L U T I O N If you are familiar with your own cholesterol level, you probablyrecognize that a level of 239.15 is fairly high for a young man. Because of this, wewould expect most young men to have a level less than 239.15. Let’s see if this is so.Table 12.13 requires that we use z-scores. We compute the z-score for a 239.15cholesterol level by using the z-score formula.

A man between 18 and 24 with a 239.15 cholesterol level is 1.5 standard deviationsabove the mean, illustrated in Figure 12.17(a). The question mark indicates that wemust find the percentage of men with a cholesterol level less than the z-score for a 239.15 cholesterol level. Table 12.13 gives this percentage as a percentile.

z = 1.5,

z239.15 =data item - meanstandard deviation

=239.15 - 178.1

40.7=

61.0540.7

= 1.5

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 720

Page 43: Ch12 Statistics

SECTION 12.4 The Normal Distribution 721

239.15

Cholesterol Level

?%

178.1 MeanMean

1.5 standard deviationsabove mean

239.15

Cholesterol Level

93.32%

178.1

1.5 standard deviationsabove mean

FIGURE 12.17(a) FIGURE 12.17(b)

Find 1.5 in the z-score column in the right portion of the table.The percentile given tothe right of 1.5 is 93.32. Thus, 93.32% of men between 18 and 24 have a cholesterollevel less than 239.15, shown in Figure 12.17(b).

A P O R T I O N O F T A B L E 1 2 . 1 3

z-Score Percentile

1.4 91.92

1.5 93.32

1.6 94.52

The distribution of monthly charges for cellphone plans in the United Statesis approximately normal with a mean of $62 and a standard deviation of $18.

What percentage of plans have charges that are less than $83.60?

The normal distribution accounts for all data items, meaning 100% of thescores. This means that Table 12.13 can also be used to find the percentage of dataitems that are greater than any data item in a normal distribution. Use thepercentile in the table to determine the percentage of data items less than the dataitem in question.Then subtract this percentage from 100% to find the percentage ofdata items greater than the item in question. In using this technique, we will treat thephrases “greater than” and “greater than or equal to” as equivalent.

E X A M P L E 8F I N D I N G T H E P E R C E N TAG E O F DATA I T E M SG R E AT E R T H A N A G I V E N DATA I T E M

Lengths of pregnancies of women are normally distributed with a mean of 266 daysand a standard deviation of 16 days. What percentage of children are born frompregnancies lasting more than 274 days?

S O L U T I O N Table 12.13 requires that we use z-scores. We compute the z-score fora 274-day pregnancy by using the z-score formula.

z274 =data item - meanstandard deviation

=274 - 266

16=

816

= 0.5

CH

EC

K POINT

7

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 721

Page 44: Ch12 Statistics

722 CHAPTER 12 Statistics

274

Length of Pregnancy

266

69.15%

100% − 69.15%= 30.85%

0.5 standard deviationabove mean

Mean

FIGURE 12.18

A 274-day pregnancy is 0.5 standard deviation above the mean. Table 12.13 givesthe percentile corresponding to 0.50 as 69.15. This means that 69.15% of pregnan-cies last less than 274 days, illustrated in Figure 12.18. We must find the number ofpregnancies lasting more than 274 days by subtracting 69.15% from 100%.

Thus, 30.85% of children are born from pregnancies lasting more than 274 days.

100% - 69.15% = 30.85%

Item 2

B% − A%

B%

Item 1

A%

FIGURE 12.19 The percentile for data item 1 isA. The percentile for data item 2 is B. Thepercentage of data items between item 1 anditem 2 is B% - A%.

A Portion of Table 12.13

z-Score Percentile

0.45 67.36

0.50 69.15

0.55 70.88

Female adult heights in North America are approximately normallydistributed with a mean of 65 inches and a standard deviation of 3.5 inches.

What percentage of North American women have heights that exceed 69.9 inches?

We have seen how Table 12.13 is used to find the percentage of data items thatare less than or greater than any given item. The table can also be used to find thepercentage of data items between two given items. Because the percentile for eachitem is the percentage of data items less than the given item, the percentage of databetween the two given items is found by subtracting the lesser percent from thegreater percent. This is illustrated in Figure 12.19.

CH

EC

K POINT

8

F I N D I N G T H E P E R C E N T A G E O F D A T A I T E M S B E T W E E NT W O G I V E N I T E M S I N A N O R M A L D I S T R I B U T I O N

1. Convert each given data item to a z-score:

2. Use Table 12.13 to find the percentile corresponding to each z-score in step 1.

3. Subtract the lesser percentile from the greater percentile and attach a % sign.

z =data item - meanstandard deviation

.

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 722

Page 45: Ch12 Statistics

SECTION 12.4 The Normal Distribution 723

E X A M P L E 9F I N D I N G T H E P E R C E N TAG E O F DATA I T E M SB E T W E E N T WO G I V E N DATA I T E M S

The amount of time that self-employed Americans work each week is normallydistributed with a mean of 44.6 hours and a standard deviation of 14.4 hours. Whatpercentage of self-employed individuals in the United States work between 37.4 and80.6 hours per week?

S O L U T I O NStep 1. Convert each given data item to a z-score.

z80.6 =data item - meanstandard deviation

=80.6 - 44.6

14.4=

3614.4

= 2.5

z37.4 =data item - meanstandard deviation

=37.4 - 44.6

14.4=

-7.214.4

= -0.5A Portion of Table 12.13

z-Score Percentile

29.12

30.85

32.64-0.45

-0.50

-0.55

Step 2. Use Table 12.13 to find the percentile corresponding to these z-scores.The percentile given to the right of is 30.85. This means that 30.85% of self-employed Americans work less than 37.4 hours per week.Table 12.13 also gives the percentile corresponding to Find 2.5 in the z-score column in the far-right portion of the table. The percentile given to the rightof 2.5 is 99.38. This means that 99.38% of self-employed Americans work less than80.6 hours per week.

z = 2.5.

-0.50A Portion of Table 12.13

z-Score Percentile

2.4 99.18

2.5 99.38

2.6 99.53Step 3. Subtract the lesser percentile from the greater percentile and attach a %sign. Subtracting percentiles, we obtain

Thus, 68.53% of self-employed Americans work between 37.4 and 80.6 hours perweek. The solution is illustrated in Figure 12.20.

99.38 - 30.85 = 68.53.

37.4 80.644.6

99.38%

30.85%99.38% − 30.85%

Mean z = 2.5z = −0.5 Hours Worked

� 68.53%

FIGURE 12.20

The distribution for the life of refrigerators is approximately normal with amean of 14 years and a standard deviation of 2.5 years. What percentage of

refrigerators have lives between 11 years and 18 years?

CH

EC

K POINT

9

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 723

Page 46: Ch12 Statistics

8 Use and interpret margins of error.

724 CHAPTER 12 Statistics

C O M P U T I N G P E R C E N T A G E O F D A T A I T E M S F O RN O R M A L D I S T R I B U T I O N S

Description of Percentage Graph

Computation of Percentage

Percentage of dataitems less than a givendata item with

Use the table percentilefor and add a %sign.

z = b

0 b

z = b

Percentage of dataitems greater than agiven data item with

Subtract the table per-centile for from100 and add a % sign.

z = a

0a

z = a

Percentage of dataitems between twogiven data items with

and

Subtract the table per-centile for fromthe table percentile for

and add a % sign.z = b

z = a

0 ba

z = bz = a

Polls and Margins of ErrorWhen you were between the ages of 6 and 14, how would you have responded tothis question:

What is bad about being a kid?

What Is Bad about Being a Kid?

Kids Say

Getting bossed around 17%

School, homework 15%

Can’t do everything I want 11%

Chores 9%

Being grounded 9%

Source: Penn, Schoen, and Berland using 1172interviews with children ages 6 to 14 from May14 to June 1, 1999, Margin of error:

Note the margin of error.

;2.9%

In a random sample of 1172 children ages 6 through 14, 17% of the childrenresponded, “Getting bossed around.” The problem is that this is a single randomsample. Do 17% of kids in the entire population of children ages 6 through 14 thinkthat getting bossed around is a bad thing?

Statisticians use properties of the normal distribution to estimate the probabilitythat a result obtained from a single sample reflects what is truly happening in thepopulation. If you look at the results of a poll like the one shown in the margin, youwill observe that a margin of error is reported. Surveys and opinion polls often givea margin of error. Let’s use our understanding of the normal distribution to see howto calculate and interpret margins of error.

Suppose that p% of the population of children ages 6 through 14 hold theopinion that getting bossed around is a bad thing about being a kid. Instead oftaking only one random sample of 1172 children, we repeat the process of select-ing a random sample of 1172 children hundreds of times. Then, we calculate thepercentage of children for each sample who think being bossed around is bad.With random sampling, we expect to find the percentage in many of the samplesclose to p%, with relatively few samples having percentages far from p%. Figure12.21 on the next page shows that the percentages of children from the hundredsof samples can be modeled by a normal distribution. The mean of this distributionis the actual population percent, p%, and is the most frequent result from thesamples.

Our work in Examples 7 through 9 is summarized as follows:

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 724

Page 47: Ch12 Statistics

SECTION 12.4 The Normal Distribution 725

Mathematicians have shown that the standard deviation of a normal

distribution of samples like the one in Figure 12.21 is approximately where

n is the sample size. Using the 68–95–99.7 Rule, approximately 95% of the sampleshave a percentage within 2 standard deviations of the true population percentage, p%:

If we use a single random sample of size n, there is a 95% probability that the

percent obtained will lie within two standard deviations, or of the true popula-

tion percent.We can be 95% confident that the true population percent lies between

and

We call the margin of error.

M A R G I N O F E R R O R I N A S U R V E YIf a statistic is obtained from a random sample of size n, there is a 95%

probability that it lies within of the true population statistic, where is called the margin of error.

E X A M P L E 1 0U S I N G A N D I N T E R P R E T I N G M A R G I NO F E R R O R

In a random sample of 1172 children ages 6 through 14, 17% of the children saidgetting bossed around is a bad thing about being a kid.

a. Verify the margin of error that was given for this survey.

b. Write a statement about the percentage of children in the population who feelthat getting bossed around is a bad thing about being a kid.

;11n

11n

;11n

the sample percent +11n

.

the sample percent -11n

11n,

2 standard deviations = 2 # 121n

=11n

.

121n

,

Percentage of Children Who FeelBeing Bossed Around Is Bad

p%

Num

ber

ofSa

mpl

es o

f Chi

ldre

n

True population percent

FIGURE 12.21 Percentage of childrenwho feel being bossed around is bad

W H A T I S B A D A B O U T B E I N G A K I D ?

Kids Say

Getting bossed around 17%

School, homework 15%

Can’t do everything I want 11%

Chores 9%

Being grounded 9%

Source: Penn, Schoen, and Berland using 1172interviews with children ages 6 to 14 from May14 to June 1, 1999, Margin of error: ;2.9%

S O L U T I O N

a. The sample size is The margin of error is

b. There is a 95% probability that the true population percentage lies between

the sample percent -11n

= 17% - 2.9% = 14.1%

;11n

= ; 121172

L ;0.029 = ;2.9%.

n = 1172.

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 725

Page 48: Ch12 Statistics

9 Recognize distributions that arenot normal.

BLIT

ZER

B NUS

A Caveat Giving aTrue Picture of aPoll’s AccuracyUnlike the precise calculation ofa poll’s margin of error, certainpolling imperfections cannot bedetermined exactly. One problemis that people do not alwaysrespond to polls honestly andaccurately. Some people areembarrassed to say “undecided,”so they make up an answer.Other people may try to respondto questions in the way they thinkwill make the pollster happy, justto be “nice.” Perhaps the follow-ing caveat, applied to the poll inExample 10, would give the pub-lic a truer picture of its accuracy:

The poll results are 14.1% to19.9% at the 95% confidencelevel, but it’s only underideal conditions that we canbe 95% confident that thetrue numbers are within2.9% of the poll’s results.Thetrue error span is probablygreater than 2.9% due tolimitations that are inherentin this and every poll, but,unfortunately, this additionalerror amount cannot becalculated precisely.Warning:Five percent of the time—that’s one time out of 20—the error will be greater than2.9%.We remind readers ofthe poll that things occurring“only” 5% of the time do,indeed, happen.

We suspect that the publicwould tire of hearing this.

726 CHAPTER 12 Statistics

and

We can be 95% confident that between 14.1% and 19.9% of all children feelthat getting bossed around is a bad thing about being a kid.

Figure 12.22 shows the question andresults of a USA Today, CNN/Gallup poll

of 485 randomly selected American adults onphysician-assisted suicide.

a. Find the margin of error for this survey.Round to the nearest tenth of a percent.

b. Write a statement about the percentage ofAmerican adults in the population whosupport physician-assisted suicide.

c. Based on your answer to part (b), explainhow the title given with the circle graph inFigure 12.22 is misleading.

Other Kinds of DistributionsThe histogram in Figure 12.23 represents the frequencies of the ages of womeninterviewed by Kinsey and his colleagues in their study of female sexual behavior.This distribution is not symmetric. The greatest frequency of women interviewedwas in the 16–20 age range. The bars get shorter and shorter after this. The shorterbars fall on the right, indicating that relatively few older women were included inKinsey’s interviews.

CH

EC

K POINT

10

the sample percent +11n

= 17% + 2.9% = 19.9%.

Num

ber

of W

omen

(fre

quen

cy)

2000

1800

1600

1400

1200

1000

800

600

Age at Time of Interview (years)

400

200

11-1516-20

21-2526-30

31-3536-40

41-4546-50

51-5556-60

66-7071-75

61-65

Although the normal distribution is the most important of all distributionsin terms of analyzing data, not all data can be approximated by this symmetricdistribution with its mean, median, and mode all having the same value.

In our discussion of measures of central tendency, we mentioned that themedian, rather than the mean, is used to summarize income. Figure 12.24 at the topof the next page illustrates the population distribution of weekly earnings in the

Majority SupportsAssisted Suicide

Yes54%

No40%

Noopinion

6%

Should a doctor be allowed to help a terminally ill patient end his or her

life with an overdose of medication if the patient is mentally competent

and requests it?

FIGURE 12.22Source: USA Today, CNN/Gallup Poll

FIGURE 12.23 Histogram of the ages offemales interviewed by Kinsey and hisassociates

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 726

Page 49: Ch12 Statistics

SECTION 12.4 The Normal Distribution 727

$0 MeanMode

Median

Num

ber

of P

eopl

e

The tail isto the right.Weekly Earnings

FIGURE 12.24 Skewed to the right

United States. There is no upper limit on weekly earnings. The relatively few peoplewith very high weekly incomes tend to pull the mean income to a value greater thanthe median. The most frequent income, the mode, occurs toward the low end of thedata items. The mean, median, and mode do not have the same value, and a normaldistribution is not an appropriate model for describing weekly earnings in the Unit-ed States.

The distribution in Figure 12.24 is called a skewed distribution. A distribution ofdata is skewed if a large number of data items are piled up at one end or the other,with a “tail” at the opposite end. In the distribution of weekly earnings in Figure 12.24,the tail is to the right. Such a distribution is said to be skewed to the right.

By contrast to the distribution of weekly earnings, the distribution in Figure 12.25has more data items at the high end of the scale than at the low end. The tail of thisdistribution is to the left. The distribution is said to be skewed to the left. In manycolleges, an example of a distribution skewed to the left is based on the student ratingsof faculty teaching performance. Most professors are given rather high ratings, whileonly a few are rated terrible. These low ratings pull the value of the mean lower thanthe median.

Mean ModeMedian

Fre

quen

cy

The tail isto the Left. FIGURE 12.25 Skewed to the left

E X E R C I S E S E T 1 2 . 4

• Practice and Application Exercises

The scores on a test are normally distributed with a mean of100 and a standard deviation of 20. In Exercises 1–10, find thescore that is

1. 1 standard deviation above the mean.

2. 2 standard deviations above the mean.

3. 3 standard deviations above the mean.

4. standard deviations above the mean.

5. standard deviations above the mean.

6. 1 standard deviation below the mean.

7. 2 standard deviations below the mean.

8. 3 standard deviations below the mean.

9. one-half a standard deviation below the mean.

10. standard deviations below the mean.2 12

2 12

1 12

Not everyone pays the same price for the same model of a car.The figure illustrates a normal distribution for the prices paidfor a particular model of a new car. The mean is $17,000 and thestandard deviation is $500.

17,500 18,000 18,500

Price of a Model of a New Car

17,00016,000 16,50015,500

Num

ber

of C

ar B

uyer

s 68%

99.7%

95%

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 727

Page 50: Ch12 Statistics

728 CHAPTER 12 Statistics

In Exercises 11–22, use the 68–95–99.7 Rule, illustrated in thefigure at the bottom of the previous page, to find the percentageof buyers who paid

11. between $16,500 and $17,500.

12. between $16,000 and $18,000.

13. between $17,000 and $17,500.

14. between $17,000 and $18,000.

15. between $16,000 and $17,000.

16. between $16,500 and $17,000.

17. between $15,500 and $17,000.

18. between $17,000 and $18,500.

19. more than $17,500.

20. more than $18,000.

21. less than $16,000.

22. less than $16,500.

Intelligence quotients (IQs) on the Stanford-Binet intelligencetest are normally distributed with a mean of 100 and a standarddeviation of 16. In Exercises 23–32, use the 68–95–99.7 Rule tofind the percentage of people with IQs

23. between 68 and 132. 24. between 84 and 116.

25. between 68 and 100. 26. between 84 and 100.

27. above 116. 28. above 132.

29. below 68. 30. below 84.

31. above 148. 32. below 52.

A set of data items is normally distributed with a mean of 60 anda standard deviation of 8. In Exercises 33–48, convert each dataitem to a z-score.

33. 68 34. 76 35. 84 36. 92

37. 64 38. 72 39. 74 40. 78

41. 60 42. 100 43. 52 44. 44

45. 48 46. 40 47. 34 48. 30

Yearly returns on large-company stocks are normallydistributed with mean yearly interest at 12.4% and a standarddeviation of 20.4%. In Exercises 49–56, find the z-score for aninvestment that

49. earns 43% annually. 50. earns 48.1% annually.

51. earns 58.3% annually. 52. earns 78.7% annually.

53. loses 13.1% annually. 54. loses 2.9% annually.

55. loses 18.2% annually. 56. loses 38.6% annually.

Intelligence quotients on the Stanford-Binet intelligence test arenormally distributed with a mean of 100 and a standard devia-tion of 16. Intelligence quotients on the Wechsler intelligence testare normally distributed with a mean of 100 and a standarddeviation of 15. Use this information to solve Exercises 57–58.

57. Use z-scores to determine which person has the higher IQ:an individual who scores 128 on the Stanford-Binet or anindividual who scores 127 on the Wechsler.

58. Use z-scores to determine which person has the higher IQ:an individual who scores 150 on the Stanford-Binet or anindividual who scores 148 on the Wechsler.

A set of data items is normally distributed with a mean of 400and a standard deviation of 50. In Exercises 59–66, find the dataitem in this distribution that corresponds to the given z-score.

59. 60.

61. 62.

63. 64.

65. 66.

Use Table 12.13 on page 720 to solve Exercises 67–82.

In Exercises 67–74, find the percentage of data items in a normaldistribution that lie a. below and b. above the given z-score.

67. 68.

69. 70.

71. 72.

73. 74.

In Exercises 75–82, find the percentage of data items in a normaldistribution that lie between

75. and 76. and

77. and 78. and

79. and 80. and

81. and 82. and

Systolic blood pressure readings are normally distributed with amean of 121 and a standard deviation of 15. (A reading above140 is considered to be high blood pressure.) In Exercises 83–92,begin by converting any given blood pressure reading or read-ings into z-scores. Then use Table 12.13 on page 720 to find thepercentage of people with blood pressure readings

83. below 142. 84. below 148.

85. above 130. 86. above 133.

87. above 103. 88. above 100.

89. between 142 and 154. 90. between 145 and 157.

91. between 112 and 130. 92. between 109 and 133.

The weights for 12-month-old baby boys are normally distrib-uted with a mean of 22.5 pounds and a standard deviation of 2.2 pounds. In Exercises 93–96, use Table 12.13 on page 720 tofind the percentage of 12-month-old baby boys who weigh

93. more than 25.8 pounds.

94. more than 23.6 pounds.

95. between 19.2 and 21.4 pounds.

96. between 18.1 and 19.2 pounds.

97. Using a random sample of 2272 American adults withchildren, a Harris survey asked respondents to name theirdream job for their child or children.The top five responsesand the percentage of parents who named each of thesejobs are shown in the bar graph at the top of the next page.

z = -0.3.z = -2.2z = -0.5.z = -2

z = 1.2.z = -1.2z = 1.5.z = -1.5

z = 3.z = 2z = 3.z = 1

z = 2.1.z = 0.3z = 1.4.z = 0.2

z = -1.8z = -1.2

z = -0.4z = -0.7

z = 1.4z = 1.2

z = 0.8z = 0.6

z = -1.5z = -2.5

z = -2z = -3

z = 2.5z = 1.5

z = 3z = 2

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 728

Page 51: Ch12 Statistics

SECTION 12.4 The Normal Distribution 729

a. Find the margin of error, to the nearest tenth of apercent, for this survey.

b. Write a statement about the percentage of parents inthe population who consider a doctor as the dream jobfor their child.

98. Using a random sample of 2297 American adults, an NBCToday Show poll asked respondents if they got enoughsleep at night. The responses are shown in the circlegraph.

a. Find the margin of error, to the nearest tenth of apercent, for this survey.

b. Write a statement about the percentage of Americanadults in the population who do not get enough sleepat night.

99. Using a random sample of 4000 TV households, NielsenMedia Research found that 60.2% watched the finalepisode of

a. Find the margin of error in this percent.

b. Write a statement about the percentage of TVhouseholds in the population who tuned into the finalepisode of

100. Using a random sample of 4000 TV households, NielsenMedia Research found that 51.1% watched Roots, Part 8.

a. Find the margin of error in this percent.

b. Write a statement about the percentage of TVhouseholds in the population who tuned into Roots,Part 8.

M*A*S*H.

M*A*S*H.

No47%

Yes47%

Not Sure6%

Do You Get EnoughSleep at Night?

Per

cent

age

What is Your Dream Job for Your Child/Children?Top Responses

Professionalathlete

9%

CEOof a

largecompany

9%

Teacher/Professor

11%

Doctor

18%

Founderof a

company

25%

5

10

20

15

25

30

101. In 1997, Nielsen Media Research increased its randomsample to 5000 TV households. By how much, to thenearest tenth of a percent, did this improve the margin oferror over that in Exercises 99 and 100?

102. If Nielsen Media Research were to increase its randomsample from 5000 to 10,000 TV households, by how much,to the nearest tenth of a percent, would this improve themargin of error?

103. The histogram shows murder rates per 100,000 residentsand the number of U.S. states that had these rates in 2003.

Freq

uenc

y

U.S. Murder Rates per 100,000 Residents,by State and Washington, D.C.

Murder Rate (per 100,000 residents)Rounded to the Nearest Whole Number

2

1

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9 10 13 44

3

8

9

2

8

9

5

2 2

1 1 1

Maine, New Hampshire, South Dakota

Louisiana

Washington, D.C.

Source: FBI, Crime in the United States

a. Is the shape of this distribution best classified asnormal, skewed to the right, or skewed to the left?

b. Calculate the mean murder rate per 100,000 residentsfor the 50 states and Washington, D.C.

c. Find the median murder rate per 100,000 residents forthe 50 states and Washington, D.C.

d. Are the mean and median murder rates consistentwith the shape of the distribution that you described inpart (a)? Explain your answer.

e. The standard deviation for the data is approximately6.1. If the distribution were roughly normal, whatwould be the z-score, rounded to one decimal place,for Washington, D.C.? Does this seem unusually high?Explain your answer.

• Writing in Mathematics104. What is a symmetric histogram?

105. Describe the normal distribution and discuss some of itsproperties.

106. Describe the 68–95–99.7 Rule.

107. Describe how to determine the z-score for a data item ina normal distribution.

108. What does a z-score measure?

109. Give an example of both a commonly occurring and aninfrequently occurring z-score. Explain how you arrivedat these examples.

110. Describe when a z-score is negative.

111. If you score in the 83rd percentile, what does this mean?

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 729

Page 52: Ch12 Statistics

730 CHAPTER 12 Statistics

112. If your weight is in the third quartile, what does thismean?

113. Explain how to find the percentage of data itemsbetween the first quartile and the third quartile.

114. Two students have scores with the same percentile, butfor different administrations of the SAT. Does this meanthat the students have the same score on the SAT?Explain your answer.

115. Give an example of a phenomenon that is normallydistributed. Explain why. (Try to be creative and not useone of the distributions discussed in this section.) Esti-mate what the mean and the standard deviation might beand describe how you determined these estimates.

116. Give an example of a phenomenon that is not normallydistributed and explain why.

• Critical Thinking Exercises117. Find two z-scores so that 40% of the data in the distribution

lies between them. (More than one answer is possible.)118. A woman insists that she will never marry a man as short

or shorter than she, knowing that only one man in400 falls into this category. Assuming a mean height of69 inches for men with a standard deviation of 2.5 inches(and a normal distribution), approximately how tall is thewoman?

119. The placement test for a college has scores that arenormally distributed with a mean of 500 and a standarddeviation of 100. If the college accepts only the top 10%of examinees, what is the cutoff score on the test foradmission?

• Group Exercise120. For this activity, group members will conduct interviews

with a random sample of students on campus. Eachstudent is to be asked. “What is the worst thing aboutbeing a student?” One response should be recorded foreach student.

a. Each member should interview enough students sothat there are at least 50 randomly selected studentsin the sample.

b. After all responses have been recorded, the groupshould organize the four most common answers. Foreach answer, compute the percentage of students in thesample who felt that this is the worst thing about beinga student.

c. Find the margin of error for your survey.

d. For each of the four most common answers, write astatement about the percentage of all students onyour campus who feel that this is the worst thingabout being a student.

SECTION 12.5 • SCATTER PLOTS, CORRELATION, AND REGRESSION LINES

O B J E C T I V E S

1. Make a scatter plot for a tableof data items.

2. Interpret information given ina scatter plot.

3. Compute the correlationcoefficient.

4. Write the equation of theregression line.

5. Use a sample’s correlationcoefficient to determinewhether there is a correlationin the population.

Surprised by the number of people smoking cigarettes in movies and television showsmade in the 1940s and 1950s? At that time, there was little awareness of therelationship between tobacco use and numerous diseases. Cigarette smoking was seenas a healthy way to relax and help digest a hearty meal. Then, in 1964, an equationchanged everything. To understand the mathematics behind this turning point inpublic health, we need to explore situations involving data collected on two variables.

Up to this point in the chapter, we have studied situations in which data setsinvolve a single variable, such as heights, weights, cholesterol levels, and lengths ofpregnancies. By contrast, the 1964 study involved data collected on two variablesfrom 11 countries—annual cigarette consumption for each adult male and deathsper million males from lung cancer. In this section, we consider situations in which

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 730

Page 53: Ch12 Statistics

S T U D Y T I P

The numbered list on the rightrepresents three possibilities.Perhaps you can provide a betterexplanation about decreasingprejudice with increasededucation.

1

2

3

4

5

6

7

8

9

10

A

H

I

G

C

D

F

E

B

J

2 4 6 8 10 12 14 16x

y

Years of Education

Scor

e on

a T

est M

easu

ring

Pre

judi

ce

FIGURE 12.26 A scatter plot for education-prejudice data

1 Make a scatter plot for a table ofdata items.

SECTION 12.5 Scatter Plots, Correlation, and Regression Lines 731

there are two data items for each randomly selected person or thing. Our interest isin determining whether or not there is a relationship between the two variables and,if so, the strength of that relationship.

Scatter Plots and CorrelationIs there a relationship between education and prejudice? With increased education,does a person’s level of prejudice tend to decrease? Notice that we are interested intwo quantities—years of education and level of prejudice. For each person in oursample, we will record the number of years of school completed and the score on atest measuring prejudice. Higher scores on this 1-to-10 test indicate greater prejudice.Using x to represent years of education and y to represent scores on a test measuringprejudice, Table 12.14 shows these two quantities for a random sample of ten people.

When two data items are collected for every person or object in a sample, thedata items can be visually displayed using a scatter plot. A scatter plot is a collectionof data points, one data point per person or object.We can make a scatter plot of thedata in Table 12.14 by drawing a horizontal axis to represent years of education anda vertical axis to represent scores on a test measuring prejudice. We then representeach of the ten respondents with a single point on the graph. For example, the dotfor respondent A is located to represent 12 years of education on the horizontal axisand 1 on the prejudice test on the vertical axis. Plotting each of the ten pieces of datain a rectangular coordinate system results in the scatter plot shown in Figure 12.26.

A scatter plot like the one in Figure 12.26 can be used to determine whether twoquantities are related. If there is a clear relationship, the quantities are said to becorrelated. The scatter plot shows a downward trend among the data points, althoughthere are a few exceptions.People with increased education tend to have a lower score onthe test measuring prejudice. Correlation is used to determine if there is a relationshipbetween two variables and, if so, the strength and direction of that relationship.

Correlation and Causal ConnectionsCorrelations can often be seen when data items are displayed on a scatter plot.Although the scatter plot in Figure 12.26 indicates a correlation between educationand prejudice, we cannot conclude that increased education causes a person’s levelof prejudice to decrease. There are at least three possible explanations:

1. The correlation between increased education and decreased prejudice is simplya coincidence.

2. Education usually involves classrooms with a variety of different kinds of people.Increased exposure to diversity in the classroom setting, which accompaniesincreased levels of education,might be an underlying cause for decreased prejudice.

3. Education, the process of acquiring knowledge, requires people to look at newideas and see things in different ways. Thus, education causes one to be moretolerant and less prejudiced.

Establishing that one thing causes another is extremely difficult, even if there isa strong correlation between these things. For example, as the air temperatureincreases, there is an increase in the number of people stung by jellyfish at thebeach.This does not mean that an increase in air temperature causes more people tobe stung. It might mean that because it is hotter, more people go into the water.With an increased number of swimmers, more people are likely to be stung. In short,correlation is not necessarily causation.

T A B L E 1 2 . 1 4 R E CO R D I N G T W O Q U A N T I T I E S I N A S A M P L E O F T E N P E O P L E

Respondent A B C D E F G H I J

Years of education (x) 12 5 14 13 8 10 16 11 12 4

Score on prejudice test (y) 1 7 2 3 5 4 1 2 3 10

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 731

Page 54: Ch12 Statistics

2 Interpret information given in ascatter plot.

732 CHAPTER 12 Statistics

(a) r = 1 perfect positive correlation

(b) r ≈ 0.8 strong positive correlation

(c) r ≈ 0.3 moderate to weak positive correlation

(d) r = 0 no correlation

(f) r ≈ −0.8 strong negative correlation

(g) r = −1 perfect negative correlation

(e) r ≈ −0.3 moderate to weak negative correlation

Regression Lines and Correlation CoefficientsFigure 12.27 shows the scatter plot for the education-prejudice data.Also shown is astraight line that seems to approximately “fit” the data points. Most of the datapoints lie either near or on this line. A line that best fits the data points in a scatterplot is called a regression line. The regression line is the particular line in which thespread of the data points around it is as small as possible.

A measure that is used to describe the strength and direction of a relationshipbetween variables whose data points lie on or near a line is called the correlationcoefficient, designated by r. Figure 12.28 shows scatter plots and correlation coeffi-cients.Variables are positively correlated if they tend to increase or decrease together,as in Figure 12.28 (a), (b), and (c). By contrast, variables are negatively correlated ifone variable tends to decrease while the other increases, as in Figure 12.28 (e), (f), and(g). Figure 12.28 illustrates that a correlation coefficient, r, is a number between and 1, inclusive. Figure 12.28(a) shows a value of 1. This indicates a perfect positivecorrelation in which all points in the scatter plot lie precisely on the regression linethat rises from left to right. Figure 12.28(g) shows a value of This indicates aperfect negative correlation in which all points in the scatter plot lie precisely on theregression line that falls from left to right.

-1.

-1

1

2

3

4

5

6

7

8

9

10

x

y

Years of Education

Scor

e on

a T

est M

easu

ring

Pre

judi

ce

2 4 6 8 10 12 14 16

FIGURE 12.27 A scatter plot with a regression line

FIGURE 12.28 Scatter plotsand correlation coefficients

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 732

Page 55: Ch12 Statistics

3 Compute the correlationcoefficient.

SECTION 12.5 Scatter Plots, Correlation, and Regression Lines 733

BLIT

ZER

B NUS

Beneficial Uses ofCorrelationCoefficients

• A Florida study showed ahigh positive correlationbetween the number ofpowerboats and the numberof manatee deaths. Many ofthese deaths were seen to becaused by boats’ propellersgashing into the manatees’bodies. Based on this study,Florida set up coastal sanc-tuaries where powerboatsare prohibited so that theselarge gentle mammals thatfloat just below the water’ssurface could thrive.

• In 1986, researchers studiedhow psychiatric patientsreadjusted to their commu-nity after their release froma mental hospital. A moder-ate positive correlation

was foundbetween patients’ attractive-ness and their postdischargesocial adjustment. The bet-ter-looking patients werebetter off. The researcherssuggested that physicalattractiveness plays a rolein patients’ readjustment tocommunity living becausegood-looking people tend tobe treated better by othersthan homely people are.

1r = 0.382

Take another look at Figure 12.28. If r is between 0 and 1, as in (b) and (c), thetwo variables are positively correlated, but not perfectly. Although all the datapoints will not lie on the regression line, as in (a), an increase in one variable tendsto be accompanied by an increase in the other.

Negative correlations are also illustrated in Figure 12.28. If r is between 0 andas in (e) and (f), the two variables are negatively correlated, but not perfectly.

Although all the data points will not lie on the regression line, as in (g), an increasein one variable tends to be accompanied by a decrease in the other.

E X A M P L E 1 I N T E R P R E T I N G A CO R R E L AT I O N CO E F F I C I E N T

In a 1971 study involving 232 subjects, researchers found a relationship betweenthe subjects’ level of stress and how often they became ill. The correlation coeffi-cient in this study was 0.32. Does this indicate a strong relationship between stressand illness?

S O L U T I O N The correlation coefficient means that as stress increases,frequency of illness also tends to increase. However, 0.32 is only a moderatecorrelation, illustrated in Figure 12.28(c). There is not, based on this study, a strongrelationship between stress and illness. In this study, the relationship is somewhatweak.

In a 1996 study involving obesity in mothers and daughters, researchersfound a relationship between a high body-mass index for the girls and their

mothers. (Body-mass index is a measure of weight relative to height. People with ahigh body-mass index are overweight or obese.) The correlation coefficient in thisstudy was 0.51. Does this indicate a weak relationship between the body-mass indexof daughters and the body-mass index of their mothers?

How to Obtain the Correlation Coefficient and the Equation of the Regression LineThe easiest way to find the correlation coefficient and the equation of the regressionline is to use a graphing or statistical calculator. Graphing calculators have statisticalmenus that enable you to enter the x and y data items for the variables. Based onthis information, you can instruct the calculator to display a scatter plot, theequation of the regression line, and the correlation coefficient.

We can also compute the correlation coefficient and the equation of theregression line by hand using formulas. First, we compute the correlation coefficient.

COMPUTING THE CORRELATION COEFFICIENT BY HANDThe following formula is used to calculate the correlation coefficient, r:

In the formula,

1©y22 = the square of the sum of the y-values

1©x22 = the square of the sum of the x-values

©y2 = the sum of the squares of the y-values

©x2 = the sum of the squares of the x-values

©xy = the sum of the product of x and y in each pair

©y = the sum of the y-values

©x = the sum of the x-values

n = the number of data points, 1x, y2

r =n1©xy2 - 1©x21©y22n1©x22 - 1©x222n1©y22 - 1©y22

.

CH

EC

K POINT

1

r = 0.32

-1,

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 733

Page 56: Ch12 Statistics

T E C H N O L O G Y

G R A P H I N G C A LC U L ATO R S , S C AT T E RP LOT S , A N D R E G R E SS I O N L I N E S

You can use a graphing calculatorto display a scatter plot and theregression line. After entering thex and y data items for years ofeducation and scores on a preju-dice test, the calculator showsthe scatter plot of the data andthe regression line.

Also displayed below is theregression line’s equation and thecorrelation coefficient, r. The slopeshown below is approximately

The negative slopereinforces the fact that there isa negative correlation betweenthe variables in Example 2.

-0.69.

734 CHAPTER 12 Statistics

When computing the correlation coefficient by hand, organize your work infive columns:

Find the sum of the numbers in each column. Then, substitute these values intothe formula for r. Example 2 illustrates computing the correlation coefficient for theeducation-prejudice test data.

E X A M P L E 2CO M P U T I N G T H E CO R R E L AT I O NCO E F F I C I E N T

Shown below are the data involving the number of years of school, x, completedby ten randomly selected people and their scores on a test measuring prejudice, y.Recall that higher scores on the measure of prejudice (1 to 10) indicate greaterlevels of prejudice. Determine the correlation coefficient between years ofeducation and scores on a prejudice test.

Respondent A B C D E F G H I J

Years of education (x) 12 5 14 13 8 10 16 11 12 4

Score on prejudice test (y) 1 7 2 3 5 4 1 2 3 10

S O L U T I O N As suggested, organize the work in five columns.

x y xy

12 1 12 144 1

5 7 35 25 49

14 2 28 196 4

13 3 39 169 9

8 5 40 64 25

10 4 40 100 16

16 1 16 256 1

11 2 22 121 4

12 3 36 144 9

4 10 40 16 100

We use these five sums to calculate the correlation coefficient.Another value in the formula for r that we have not yet determined is n, the

number of data points (x, y). Because there are ten items in the x-column and tenitems in the y-column, the number of data points (x, y) is ten. Thus,

In order to calculate r, we also need to find the square of the sum of the x-valuesand the y-values:

We are ready to determine the value for r.

L -0.92

=-910213252736

=1013082 - 1051382210112352 - 11,02521012182 - 1444

r =n1©xy2 - 1©x21©y24n1©x22 - 1©x224n1©y22 - 1©y22

1©x22 = 110522 = 11,025 and 1©y22 = 13822 = 1444.

n = 10.

Add all values inthe y2-column.

Add all values inthe x2-column.

Add all values inthe xy-column.

Add all values inthe y-column.

Add all values inthe x-column.

©y2 = 218©x2 = 1235©xy = 308©y = 38©x = 105

y2x2

x y xy x2 y2

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 734

Page 57: Ch12 Statistics

4 Write the equation of theregression line.

“Common wisdom amongstatisticians is that at least 5%of all data points are corrupted,either when they are initiallyrecorded or when they areentered into the computer.”

Jessica Utts, STATISTICIAN

SECTION 12.5 Scatter Plots, Correlation, and Regression Lines 735

The value for r, approximately is fairly close to and indicates a strongnegative correlation. This means that the more education a person has, the less prej-udiced that person is (based on scores on the test measuring levels of prejudice).

Is there a relationship between alcohol from moderate wine consumptionand heart disease death rate? The table gives data from 19 developed

countries. Using a calculator, determine the correlation coefficient between thesevariables. Round to two decimal places.

CH

EC

K POINT

2

-1-0.92,

Country A B C D E F G H I J K L M N O P Q R S

Liters of alcohol from drinking wine,per person per year (x)

2.5 3.9 2.9 2.4 2.9 0.8 9.1 0.8 0.7 7.9 1.8 1.9 0.8 6.5 1.6 5.8 1.3 1.2 2.7

Deaths from heart disease, per 100,000 people per year (y)

211 167 131 191 220 297 71 211 300 107 167 266 227 86 207 115 285 199 172

Source: New York Times, December 28, 1994

U.S.France

Once we have determined that two variables are related, we can use theequation of the regression line to determine the exact relationship. Here is the formula for writing the equation of the line that best fits the data:

WRITING THE EQUATION OF THE REGRESSION LINE BY HANDThe equation of the regression line is

where

E X A M P L E 3W R I T I N G T H E E Q UAT I O N O F T H ER E G R E S S I O N L I N E

a. Shown, again, in Figure 12.27 is the scatter plot and the regression line for thedata in Example 2. Use the data to find the equation of the regression line thatrelates years of education and scores on a prejudice test.

b. Approximately what score on the test can be anticipated by a person with nineyears of education?

S O L U T I O N

a. We use the sums obtained in Example 2. We begin by computing m.

With a negative correlation coefficient, it makes sense that the slope of theregression line is negative. This line falls from left to right, indicating a negativecorrelation.Now, we find the y-intercept, b.

b =©y - m1©x2

n=

38 - 1-0.69211052

10=

110.4510

L 11.05

m =n1©xy2 - 1©x21©y2

n1©x22 - 1©x22=

1013082 - 1051382

10112352 - 110522=

-9101325

L -0.69

m =n1©xy2 - 1©x21©y2

n1©x22 - 1©x22 and b =

©y - m1©x2

n.

y = mx + b,

1

2

3

4

5

6

7

8

9

10

x

y

Years of Education

Scor

e on

a T

est M

easu

ring

Pre

judi

ce

2 4 6 8 10 12 14 16

FIGURE 12.27 (repeated)

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 735

Page 58: Ch12 Statistics

5Use a sample’s correlationcoefficient to determine whetherthere is a correlation in thepopulation.

736 CHAPTER 12 Statistics

Using and , the equation of the regression line, is

where x represents the number of years of education and y represents the scoreon the prejudice test.

b. To anticipate the score on the prejudice test for a person with nine years ofeducation, substitute 9 for x in the regression line’s equation.

A person with nine years of education is anticipated to have a score close to 5on the prejudice test.

Use the data in Check Point 2 on page 735 to find the equation of theregression line. Use the equation to predict the heart disease death rate in a

country where adults average 10 liters of alcohol per person per year.

The Level of Significance of rIn Example 2, we found a strong negative correlation between education and preju-dice, computing the correlation coefficient, r, to be However, the sample size

was relatively small. With such a small sample, can we truly conclude thata correlation exists in the population? Or could it be that education and prejudiceare not related? Perhaps the results we obtained were simply due to sampling errorand chance.

Mathematicians have identified values to determine whether r, the correlationcoefficient for a sample, can be attributed to a relationship between variables in thepopulation. These values are shown in the second and third columns of Table 12.15.They depend on the sample size, n, listed in the left column. If the absolute valueof the correlation coefficient computed for the sample, is greater than the valuegiven in the table, a correlation exists between the variables in the population. Thecolumn headed denotes a significance level of 5%, meaning that there is a0.05 probability that, when the statistician says the variables are correlated, they areactually not related in the population. The column on the right, headed denotes a significance level of 1%, meaning that there is a 0.01 probability that,when the statistician says the variables are correlated, they are actually not relatedin the population. Values in the column are greater than those in the

column. Because of the possibility of sampling error, there is always aprobability that when we say the variables are related, there is actually not acorrelation in the population from which the sample was randomly selected.

E X A M P L E 4D E T E R M I N I N G A CO R R E L AT I O N I N T H EP O P U L AT I O N

In Example 2, we computed for Can we conclude that there is anegative correlation between education and prejudice in the population?

S O L U T I O N Begin by taking the absolute value of the calculated correlationcoefficient.

Now, look to the right of in Table 12.15. Because 0.92 is greater than both ofthese values (0.632 and 0.765), we may conclude that a correlation does existbetween education and prejudice in the population. (There is a probability of atmost 0.01 that the variables are not really correlated in the population and ourresults could be attributed to chance.)

n = 10

ƒr ƒ = ƒ -0.92 ƒ = 0.92

n = 10.r = -0.92

a = 0.05a = 0.01

a = 0.01,

a = 0.05

ƒr ƒ ,

1n = 102-0.92.

CH

EC

K POINT

3

y = -0.69192 + 11.05 = 4.84

y = -0.69x + 11.05

y = -0.69x + 11.05,

y = mx + b,b L 11.05m L - 0.69

T A B L E 1 2 . 1 5 VA LU E S F O R D E T E R -

M I N I N G CO R R E L AT I O N S I N A P O P U L AT I O N

The larger the sample size, n, the smaller isthe value of r needed for a correlation inthe population.

n

4 0.950 0.990

5 0.878 0.959

6 0.811 0.917

7 0.754 0.875

8 0.707 0.834

9 0.666 0.798

10 0.632 0.765

11 0.602 0.735

12 0.576 0.708

13 0.553 0.684

14 0.532 0.661

15 0.514 0.641

16 0.497 0.623

17 0.482 0.606

18 0.468 0.590

19 0.456 0.575

20 0.444 0.561

22 0.423 0.537

27 0.381 0.487

32 0.349 0.449

37 0.325 0.418

42 0.304 0.393

47 0.288 0.372

52 0.273 0.354

62 0.250 0.325

72 0.232 0.302

82 0.217 0.283

92 0.205 0.267

102 0.195 0.254

A � 0.01A � 0.05

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 736

Page 59: Ch12 Statistics

SECTION 12.5 Scatter Plots, Correlation, and Regression Lines 737

If you worked Check Point 2 correctly, you should have found thatfor Can you conclude that there is a negative correlation

between moderate wine consumption and heart disease death rate?n = 19.r L -0.84

CH

EC

K POINT

1

Cigarettes and Lung Cancer

Annual Cigarette Consumptionfor Each Adult Male

Great Britain

Finland

U.S.

Switzerland

Holland

Denmark AustraliaCanada

Sweden

Norway

Iceland

250 500 750 1000 1250 1500

Dea

ths

per

Mill

ion

Mal

es fr

om L

ung

Can

cer

100

200

300

400

500

0

This scatter plot shows a relationship between cigarette consumptionamong males and deaths due to lung cancer per million males. The dataare from 11 countries and date back to a 1964 report by the U.S. SurgeonGeneral. The scatter plot can be modeled by a line whose slope indicatesan increasing death rate from lung cancer with increased cigaretteconsumption. At that time, the tobacco industry argued that in spite ofthis regression line, tobacco use is not the cause of cancer. Recent data do,indeed, show a causal effect between tobacco use and numerous diseases.

BLIT

ZER

B NUS

E X E R C I S E S E T 1 2 . 5

• Practice and Application ExercisesIn Exercises 1–8, make a scatter plot for the given data. Use thescatter plot to describe whether or not the variables appear tobe related.

1. x 1 6 4 3 7 2

y 2 5 3 3 4 1

2. x 2 1 6 3 4

y 4 5 10 8 9

3. x 8 6 1 5 4 10 3

y 2 4 10 5 6 2 9

4. x 4 5 2 1

y 1 3 5 4

5. Respondent A B C D E F G

Years of education of parent (x) 13 9 7 12 12 10 11

Years of education of child (y) 13 11 7 16 17 8 17

6. Respondent A B C D E

IQ (x) 110 115 120 125 135

Annual income(y) (in thousands of dollars) 30 32 36 40 44

7. The data show the number of registered automaticweapons, in thousands, and the murder rate, in murders per100,000, for eight randomly selected states.

Automatic weapons, x 11.6 8.3 6.9 3.6 2.6 2.5 2.4 0.6

Murder rate, y 13.1 10.6 11.5 10.1 5.3 6.6 3.6 4.4

Source: FBI and Bureau of Alcohol, Tobacco, and Firearms

Source: Smoking and Health, Washington, D.C., 1964

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 737

Page 60: Ch12 Statistics

738 CHAPTER 12 Statistics

10

Bir

ths

per W

oman

98765

Kenya Syria Jordan BotswanaZimbabwe

HondurasLesothoVietnam

FijiChile

Spain

Costa Rica

Denmark

4321

10 800 3020 5040

Percentage of Married Women ofChild-Bearing Age Using Contraceptives

70600

Contraceptive Prevalence and Average Number ofBirths per Woman, Selected Countries

8. The data show the number of employed and unemployedmale workers, 20 years and older, in thousands, for sixselected years in the United States.

Year 1995 1996 1997 1998 1999 2000

Employed,x 64,085 64,897 66,524 67,134 67,761 68,580

Unemployed,y 3239 3147 2826 2580 2433 2350

Source: Bureau of Labor Statistics

The scatter plot in the figure shows the relationship betweenthe percentage of married women of child-bearing age usingcontraceptives and births per woman in selected countries. Usethe scatter plot to determine whether each of the statements inExercises 9–18 is true or false.

Just as money doesn’t buy happiness for individuals, the twodon’t necessarily go together for countries either. However, thescatter plot does show a relationship between a country’s annualper capita income and the percentage of people in that countrywho call themselves “happy.” Use the scatter plot to determinewhether each of the statements in Exercises 19–26 is true or false.

Source: Population Reference Bureau

9. There is a strong positive correlation between contracep-tive use and births per woman.

10. There is no correlation between contraceptive use andbirths per woman.

11. There is a strong negative correlation between contracep-tive use and births per woman.

12. There is a causal relationship between contraceptive useand births per woman.

13. With approximately 43% of women of child-bearing ageusing contraceptives, there are 3 births per woman in Chile.

14. With 20% of women of child-bearing age using contracep-tives, there are 6 births per woman in Vietnam.

15. No two countries have a different number of births perwoman with the same percentage of married women usingcontraceptives.

16. The country with the greatest number of births perwoman also has the smallest percentage of women usingcontraceptives.

17. Most of the data points do not lie on the regression line.

18. The number of selected countries shown in the scatter plotis approximately 20.

80

90

100

70

60

50

40Per

cent

age

of P

eopl

eC

allin

g T

hem

selv

es "

Hap

py"

Annual Per Capita Income (dollars)

Per Capita Income and National Happiness

$5000 $15,000 $35,000$25,000

30

x

y

Nigeria Colombia Venezuela

AlgeriaEgypt

Slovakia

PolandEstonia

Bulgaria

Latvia

TurkeyJordan

India

Albania

Pakistan

UkraineZimbabwe

MoldovaTanzania

Russia

Iran

Romania

Belarus

South Africa

Hungary

ArgentinaUruguay

CroatiaBrazilChina

Philippines

CzechRepublic

Mexico New Zealand

Japan

Australia

Germany

Portugal

Spain

IsraelSlovenia

Greece

SouthKorea

Chile

VietnamIndonesia

Italy

France

Ireland NetherlandsSwitzerland

Norway

Canada

FinlandSwedenSingaporeBritain

Denmark

Austria

Belgium

U.S.

Source: Richard Layard, Happiness: Lessons from a New Science, Penguin, 2005

19. There is no correlation between per capita income and thepercentage of people who call themselves “happy.”

20. There is an almost-perfect positive correlation betweenper capita income and the percentage of people who callthemselves “happy.”

21. There is a positive correlation between per capita incomeand the percentage of people who call themselves “happy.”

22. As per capita income decreases, the percentage of peoplewho call themselves “happy” also tends to decrease.

23. The country with the lowest per capita income has theleast percentage of people who call themselves “happy.”

24. The country with the highest per capita income has thegreatest percentage of people who call themselves “happy.”

25. A reasonable estimate of the correlation coefficient for thedata is 0.8.

26. A reasonable estimate of the correlation coefficient forthe data is

Use the scatter plots shown, labeled (a)–(f), to solve Exercises27–30.

27. Which scatter plot indicates a perfect negative correlation?

(a)

(d)

(b)

(e)

(c)

(f)

-0.3.

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 738

Page 61: Ch12 Statistics

SECTION 12.5 Scatter Plots, Correlation, and Regression Lines 739

28. Which scatter plot indicates a perfect positive correlation?

29. In which scatter plot is

30. In which scatter plot is

Compute r, the correlation coefficient, rounded to the nearestthousandth, for the data in

31. Exercise 1. 32. Exercise 2.

33. Exercise 3. 34. Exercise 4.

35. Use the data in Exercise 5 to solve this exercise.a. Determine the correlation coefficient between years of

education of parent and child.b. Find the equation of the regression line for years of

education of parent and child.c. Approximately how many years of education can we

predict for a child with a parent who has 16 years ofeducation?

36. Use the data in Exercise 6 to solve this exercise.a. Determine the correlation coefficient between IQ

and income.b. Find the equation of the regression line for IQ and income.c. Approximately what annual income can be anticipated

by a person whose IQ is 123?

37. Use the data in Exercise 7 to solve this exercise.a. Determine the correlation coefficient between the

number of automatic weapons and the murder rate.b. Find the equation of the regression line.c. Approximately what murder rate can we anticipate in a

state that has 14 thousand registered weapons?

38. Use the data in Exercise 8 to solve this exercise.a. Determine the correlation coefficient between the

number of employed males and the number ofunemployed males.

b. Find the equation of the regression line.c. Approximately how many unemployed males can we

anticipate for a year in which there are 70,000 thousandemployed males?

In Exercises 39–45, the correlation coefficient, r, is given for asample of n data points. Use the column in Table 12.15on page 736 to determine whether or not we may conclude that acorrelation does exist in the population. (Using the column, there is a probability of 0.05 that the variables are notreally correlated in the population and our results could beattributed to chance. Ignore this possibility when concludingwhether or not there is a correlation in the population.)

39. 40.

41. 42.

43. 44.

45.

46. In the 1964 study on cigarette consumption and deaths dueto lung cancer (see the Blitzer Bonus on page 737),and What can you conclude using the column in Table 12.15?

• Writing in Mathematics47. What is a scatter plot?48. How does a scatter plot indicate that two variables are

correlated?

a = 0.05r = 0.73.n = 11

n = 20, r = -0.37

n = 37, r = -0.37n = 72, r = -0.351

n = 22, r = 0.04n = 12, r = 0.5

n = 27, r = 0.4n = 20, r = 0.5

a = 0.05

a = 0.05

r = 0.01?

r = 0.9?

49. Give an example of two variables with a strong positivecorrelation and explain why this is so.

50. Give an example of two variables with a strong negativecorrelation and explain why this is so.

51. What is meant by a regression line?

52. When all points in a scatter plot fall on the regression line,what is the value of the correlation coefficient? Describewhat this means.

For the pairs of quantities in Exercises 53–56, describe whethera scatter plot will show a positive correlation, a negativecorrelation, or no correlation. If there is a correlation, is itstrong, moderate, or weak? Explain your answers.

53. Height and weight

54. Number of days absent and grade in a course

55. Height and grade in a course

56. Hours of television watched and grade in a course

57. Explain how to use the correlation coefficient for a sampleto determine if there is a correlation in the population.

• Critical Thinking Exercises58. Which one of the following is true?

a. A scatter plot need not define y as a function of x.

b. The correlation coefficient and the slope of the regres-sion line for the same set of data can have opposite signs.

c. When all points in a scatter plot fall on the regressionline, the value of the correlation coefficient is 0.

d. If the same number is subtracted from each x-item, butthe y-item stays the same, the correlation coefficient forthese new data points decreases.

59. Give an example of two variables with a strong correla-tion, where each variable is not the cause of the other.

• Technology Exercise60. Use the linear regression feature of a graphing calculator

to verify your work in any two exercises from Exercises35–38, parts (a) and (b).

• Group Exercises61. The group should select two variables related to people

on your campus that it believes have a strong positive ornegative correlation. Once these variables have beendetermined,a. Collect at least 30 ordered pairs of data (x, y) from a

sample of people on your campus.b. Draw a scatter plot for the data collected.c. Does the scatter plot indicate a positive correlation, a

negative correlation, or no relationship between thevariables?

d. Calculate r. Does the value of r reinforce the impressionconveyed by the scatter plot?

e. Find the equation of the regression line.f. Use the regression line’s equation to make a prediction

about a y-value given an x-value.g. Are the results of this project consistent with the group’s

original belief about the correlation between the vari-ables, or are there some surprises in the data collected?

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 739

Page 62: Ch12 Statistics

CHAPTER SUMMARY, REVIEW, AND TESTS U M M A R Y D E F I N I T I O N S A N D C O N C E P T S

1 2 .1 Sampling, Frequency Distributions, and Graphs

a. A population is the set containing all objects whose properties are to be described and analyzed.A sample is a subset of the population.

E X A M P L E S

E x . 1 , p . 6 8 1

740 CHAPTER 12 Statistics

62. What is the opinion of students on your campus about Group members should begin by deciding on some aspectof college life around which student opinion can be polled.The poll should consist of the question, “What is youropinion of ” Be sure to provide options such asexcellent, good, average, poor, horrible, or a 1-to-10 scale,

Á ?

Á ? or possibly grades of A, B, C, D, F. Use a random sample ofstudents on your campus and conduct the opinion survey.After collecting the data, present and interpret it using asmany of the skills and techniques learned in this chapteras possible.

b. Random samples are obtained in such a way that each member of the population has an equal chanceof being selected.

E x . 2 , p . 6 8 2

c. Data can be organized and presented in frequency distributions, grouped frequency distributions,histograms, frequency polygons, and stem-and-leaf plots.

E x . 3 , p . 6 8 3 ;E x . 4 , p . 6 8 4 ;F i g u r e s 1 2 . 2 a n d1 2 . 3 , p . 6 8 5 ;E x . 5 , p . 6 8 6

d. The box on page 687 lists some things to watch for in visual displays of data. Ta b l e 1 2 .4 , p . 6 8 8

1 2 . 2 Measures of Central Tendency

a. The mean is the sum of the data items divided by the number of items. Mean =©x

n. E x . 1 , p . 69 3

b. The mean of a frequency distribution is computed using

where x is each data value, is its frequency, and n is the total frequency of the distribution.f

Mean =©xf

n,

E x . 2 , p . 694

c. The median of ranked data is the item in the middle or the mean of the two middlemost items.The median is the value in the position in the list of ranked data.n + 1

2

E x . 3 , p . 69 5 ;E x . 4 , p . 69 6 ;E x . 5 , p . 697;E x . 6 , p . 697

d. When one or more data items are much greater than or much less than the other items, these extremevalues greatly influence the mean, often making the median more representative of the data.

E x . 8 , p . 70 0

f. The midrange is computed using

lowest data value + highest data value

2.

E x . 9, p . 70 1 ;E x . 1 0, p . 70 1

1 2 . 3 Measures of Dispersion

a. Range = highest data value - lowest data value E x . 1 , p . 70 5

E x . 7, p . 69 8

e. The mode of a data set is the value that occurs most often. If there is no such value, there is no mode.If more than one data value has the highest frequency, then each of these data values is a mode.

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 740

Page 63: Ch12 Statistics

Summary 741

b. Standard deviation = C©1data item - mean22

n - 1

E x . 2 , p . 70 6 ;E x . 3 , p . 707;E x . 4 , p . 70 8

c. As the spread of data items increases, the standard deviation gets larger. E x . 5 , p . 70 9

1 2 .4 The Normal Distribution

a. The normal distribution is a theoretical distribution for the entire population. The distribution is bellshaped and symmetric about a vertical line through its center, where the mean, median, and mode arelocated.

b. The 68–95–99.7 Rule

Approximately 68% of the data items fall within 1 standard deviation of the mean,

95% of the data items fall within 2 standard deviations of the mean, and

99.7% of the data items fall within 3 standard deviations of the mean.

E x . 1 , p . 7 1 4 ;E x . 2 , p . 7 1 5

c. A z-score describes how many standard deviations a data item in a normal distribution lies above orbelow the mean.

z-score =data item - meanstandard deviation

E x . 3 , p . 7 1 6 ;E x . 4 , p . 7 1 8 ;E x . 5 , p . 7 1 8

d. If n% of the items in a distribution are less than a particular data item, that data item is in the nth per-centile of the distribution. The 25th percentile is the first quartile, the 50th percentile, or the median,is the second quartile, and the 75th percentile is the third quartile.

E x . 6 , p . 7 1 9 ; F i g u r e 1 2 .1 6 , p . 7 1 9

e. A table showing z-scores and their percentiles can be used to find the percentage of data items lessthan or greater than a given data item in a normal distribution, as well as the percentage of data itemsbetween two given items. See the boxed summary on computing percentage of data items onpage 724.

E x . 7, p . 7 2 0 ;E x . 8 , p . 7 2 1 ;E x . 9, p . 7 2 3

f. If a statistic is obtained from a random sample of size n, there is a 95% probability that it lies within

of the true population statistic. is called the margin of error.;11n

11n

E x . 1 0, p . 7 2 5

g. A distribution of data is skewed if a large number of data items are piled up at one end or the other,with a “tail” at the opposite end.

1 2 . 5 Scatter Plots, Correlation, and Regression Lines

a. A plot of data points is called a scatter plot. If the points lie approximately along a line, the line thatbest fits the data is called a regression line.

b. A correlation coefficient, r, measures the strength and direction of a possible relationship betweenvariables. If there is a perfect positive correlation, and if there is a perfect negativecorrelation. If there is no relationship between the variables. Table 12.15 on page 736 indicateswhether r denotes a correlation in the population.

r = 0,r = -1,r = 1,

E x . 1 , p . 7 3 3 ;E x . 4 , p . 7 3 6

c. The formula for computing the correlation coefficient, r, is given in the box on page 733.The equationof the regression line is given in the box on page 735.

E x . 2 , p . 7 3 4 ;E x . 3 , p . 7 3 5

F i g u r e 1 2 . 24 , p . 7 27;F i g u r e 1 2 . 2 5 , p . 7 27

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 741

Page 64: Ch12 Statistics

$15.56

$26.72

$21.84

$22.51

$27.54

$37.66$43.26

Source: U.S. Department of Energy

742 CHAPTER 12 Statistics

R E V I E W E X E R C I S E S7. Construct a grouped frequency distribution for the data. Use

0–39 for the first class, 40–49 for the second class, and makeeach subsequent class width the same as the second class.

8. Construct a stem-and-leaf plot for the data.

9. Describe what is misleading about the size of the barrels inthe following visual display.

12.2In Exercises 10–11, find the mean for each group of data items.

10. 84, 90, 95, 89, 98

11. 33, 27, 9, 10, 6, 7, 11, 23, 27

12. Find the mean for the data items in the given frequencydistribution.

Score x Frequency

1 2

2 4

3 3

4 1

In Exercises 13–14, find the median for each group of data items.

13. 33, 27, 9, 10, 6, 7, 11, 23, 27

14. 28, 16, 22, 28, 34

15. Find the median for the data items in the frequencydistribution in Exercise 12.

In Exercises 16–17, find the mode for each group of data items.If there is no mode, so state.

16. 33, 27, 9, 10, 6, 7, 11, 23, 27

17. 582, 585, 583, 585, 587, 587, 589

18. Find the mode for the data items in the frequencydistribution in Exercise 12.

In Exercises 19–20, find the midrange for each group of data items.

19. 84, 90, 95, 88, 98

20. 33, 27, 9, 10, 6, 7, 11, 23, 27

21. Find the midrange for the data items in the frequencydistribution in Exercise 12.

f

12.1

A survey of 1511 randomly selected American women ages 25through 60 was taken by CyberPulse Advisory Panel. The databelow are from the poll. Use this information to solve Exercises 1–2.

1. Describe the population and the sample of this poll.2. For each woman polled, what variable is measured?3. The government of a large city wants to know if its citizens

will support a three-year tax increase to provide additionalsupport to the city’s community college system. The gov-ernment decides to conduct a survey of the city’s residentsbefore placing a tax increase initiative on the ballot.Whichone of the following is most appropriate for obtaining asample of the city’s residents?a. Survey a random sample of persons within each geo-

graphic region of the city.b. Survey a random sample of community college profes-

sors living in the city.c. Survey every tenth person who walks into the city’s

government center on two randomly selected days ofthe week.

d. Survey a random sample of persons within each geo-graphic region of the state in which the city is located.

A random sample of ten college students is selected and each stu-dent is asked how much time he or she spent on homework duringthe previous weekend.The following times, in hours, are obtained:

8, 10, 9, 7, 9, 8, 7, 6, 8, 7.

Use these data items to solve Exercises 4–6.

4. Construct a frequency distribution for the data.5. Construct a histogram for the data.6. Construct a frequency polygon for the data.

The 50 grades on a physiology test are shown. Use the data tosolve Exercises 7–8.

44 24 54 81 1834 39 63 67 6072 36 91 47 7557 74 87 49 8659 14 26 41 9013 29 13 31 6863 35 29 70 2295 17 50 42 2773 11 42 31 6956 40 31 45 51

Once amonth21%

More thanonce amonth12%A few times

a year37%

Rarelyor never

24%

Once a week6%

How Often Do You EntertainGuests for Dinner?

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 742

Page 65: Ch12 Statistics

Review Exercises 743

22. Researchers at Emory University studied national health-care data from 14,000 Americans in 1987 and 2002. Explainhow the research team obtained the mean annual health-care costs shown in the bar graph.

23. A student took seven tests in a course, scoring between90% and 95% on three of the tests, between 80% and 89%on three of the tests, and below 40% on one of the tests. Inthis distribution, is the mean or the median more represen-tative of the student’s overall performance in the course?Explain your answer.

24. The data items below are the ages of U.S. presidents at thetime of their first inauguration.

57 61 57 57 58 57 61 54 68 51 49 64 50 4865 52 56 46 54 49 51 47 55 55 54 42 51 5655 51 54 51 60 62 43 55 56 61 52 69 64 46 54

a. Organize the data in a frequency distribution.b. Use the frequency distribution to find the mean age,

median age, modal age, and midrange age of the presi-dents when they were inaugurated.

12.3In Exercises 25–26, find the range for each group of data items.25. 28, 34, 16, 22, 2826. 312, 783, 219, 312, 426, 21927. The mean for the data items 29, 9, 8, 22, 46, 51, 48, 42, 53, 42

is 35. Find a. the deviation from the mean for each dataitem and b. the sum of the deviations in part (a).

28. Use the data items 36, 26, 24, 90, and 74 to find a. the mean,b. the deviation from the mean for each data item, and c. the sum of the deviations in part (b).

In Exercises 29–30, find the standard deviation for each groupof data items.29. 3, 3, 5, 8, 10, 1330. 20, 27, 23, 26, 28, 32, 33, 3531. A test measuring anxiety levels is administered to a sample

of ten college students with the following results. (Highscores indicate high anxiety.)

10, 30, 37, 40, 43, 44, 45, 69, 86, 86Find the mean, range, and standard deviation for the data.

32. Compute the mean and the standard deviation for each ofthe following data sets. Then, write a brief description ofsimilarities and differences between the two sets based oneach of your computations.Set A: 80, 80, 80, 80 Set B: 70, 70, 90, 90

33. Describe how you would determine

a. which of the two groups, men or women, at your collegehas a higher mean grade point average.

b. which of the groups is more consistently close to itsmean grade point average.

12.4The scores on a test are normally distributed with a mean of 70and a standard deviation of 8. In Exercises 34–36, find thescore that is

34. 2 standard deviations above the mean.

35. standard deviations above the mean.

36. standard deviations below the mean.

The ages of people living in a retirement community arenormally distributed with a mean age of 68 years and a standarddeviation of 4 years. In Exercises 37–43, use the 68–95–99.7 Ruleto find the percentage of people in the community whose ages

37. are between 64 and 72. 38. are between 60 and 76.

39. are between 68 and 72. 40. are between 56 and 80.

41. exceed 72. 42. are less than 72.

43. exceed 76.

A set of data items is normally distributed with a mean of 50 anda standard deviation of 5. In Exercises 44–48, convert each dataitem to a z-score.

44. 50 45. 60 46. 58

47. 35 48. 44

49. A student scores 60 on a vocabulary test and 80 on agrammar test. The data items for both tests are normallydistributed. The vocabulary test has a mean of 50 and astandard deviation of 5. The grammar test has a mean of72 and a standard deviation of 6. On which test did thestudent have the better score? Explain why this is so.

The number of miles that a particular brand of car tires lasts isnormally distributed with a mean of 32,000 miles and a standarddeviation of 4000 miles. In Exercises 50–52, find the data item inthis distribution that corresponds to the given z-score.

50. 51. 52.

The mean cholesterol level for all men in the United States is200 and the standard deviation is 15. In Exercises 53–56, useTable 12.13 on page 720 to find the percentage of U.S. menwhose cholesterol level

53. is less than 221. 54. is greater than 173.

55. is between 173 and 221. 56. is between 164 and 182.

Use the percentiles for the weights of adult men over 40 to solveExercises 57–59.

Weight Percentile

235 86

227 third quartile

180 second quartile

173 first quartile

Find the percentage of men over 40 who weigh

57. less than 227 pounds. 58. more than 235 pounds.

59. between 227 and 235 pounds.

z = -2.5z = 2.25z = 1.5

1 14

3 12

Cos

t (th

ousa

nds

of d

olla

rs)

Mean Annual Health-Care Costs

Year

1987

$5

$4

$3

$2

$1

Normal-weight person Obese person

2002

$1512$1784

$2210

$3454

Source: Kenneth Thorpe, Emory University

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 743

Page 66: Ch12 Statistics

744 CHAPTER 12 Statistics

y

x

Percentage of Adult Females Who Are Literate

Und

er-F

ive

Mor

talit

y(p

er th

ousa

nd)

Literacy and Child Mortality

1009080706050403020100

350

300

250

200

150

100

50

60. Using a random sample of 2041 executives of Americancompanies, a Korn/Ferry survey asked respondents if theircareer was related to the area of their college degree. Thepoll indicated that 85% of the executives responded “yes”and 15% said “no.”

a. Find the margin of error, to the nearest tenth of apercent, for this survey.

b. Write a statement about the percentage of Americanexecutives in the population whose career is not relatedto their college degree.

61. The histogram indicates the frequencies of the number ofsyllables per word for 100 randomly selected words inJapanese.

a. Is the shape of this distribution best classified asnormal, skewed to the right, or skewed to the left?

b. Find the mean, median, and mode for the number ofsyllables in the sample of Japanese words.

c. Are the measures of central tendency from part (b)consistent with the shape of the distribution that youdescribed in part (a)? Explain your answer.

12.5

In Exercises 62–63, make a scatter plot for the given data. Usethe scatter plot to describe whether or not the variables appear tobe related.

62. x 1 3 4 6 8 9

y 1 2 3 3 5 5

2 1

189

36 34

Number of Syllablesin Japanese Words

531 642

Number of Syllables

Num

ber

of W

ords

50

0

The scatter plot in the figure shows the relationship between thepercentage of adult females in a country who are literate and themortality of children under five. Also shown is the regressionline. Use this information to determine whether each of thestatements in Exercises 64–70 is true or false.

64. There is a perfect negative correlation between the percent-age of adult females who are literate and under-five mortality.

65. As the percentage of adult females who are literateincreases, under-five mortality tends to decrease.

66. The country with the least percentage of adult femaleswho are literate has the greatest under-five mortality.

67. No two countries have the same percentage of adultfemales who are literate but different under-five mortalities.

68. There are more than 20 countries in this sample.

69. There is no correlation between the percentage of adultfemales who are literate and under-five mortality.

70. The country with the greatest percentage of adult femaleswho are literate has an under-five mortality rate that is lessthan 50 children per thousand.

71. Which one of the following scatter plots indicates acorrelation coefficient of approximately

(b)(a)

(d)(c)

-0.9?

63. Costa Country Canada U.S. Mexico Brazil Rica

Life expectancy in years, x 79 76 72 64 76

Infant deaths per 1000 births, y 5.6 6.4 25.9 40.0 13.1

Ban- Aus-Denmark China Egypt Pakistan gladesh tralia Japan Russia

76 70 62 59 57 80 80 65

6.8 45.5 69.3 93.5 97.7 5.3 4.1 23.3

Source: U.S. Bureau of the Census International Database

Source: United Nations

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 744

Page 67: Ch12 Statistics

Chapter 12 Test 745

72. Use the data in Exercise 62 to solve the exercise.

a. Compute r, the correlation coefficient, rounded to thenearest thousandth.

b. Find the equation of the regression line.

73. The graph, based on Nielsen Media Research 2005 datataken from random samples of Americans at various ages,indicates that as we get older, we watch more television.

a. Let x represent one’s age and let y represent hours perweek watching television. Calculate the correlationcoefficient.

b. Using Table 12.15 on page 736 and the column, determine whether there is a correlationbetween age and time spent watching television in theAmerican population.

a = 0.05

C H A P T E R 1 2 T E S T

5. Construct a grouped frequency distribution for the data.Use 40–49 for the first class and use the same width foreach subsequent class.

6. Construct a stem-and-leaf display for the data.

7. The graph shows the percentage of students in the UnitedStates through grade 12 who were home-schooled in 1999and 2003. What impression does the roofline in the visualdisplay imply about what occurred in 2000 through 2002?How might this be misleading?

Hou

rs p

er W

eek

Wat

chin

g Te

levi

sion

Television Viewing, by Age

Age22

26

32

32

42

34

52

39

62

44

10

30

20

40

50

Source: Nielsen Media Research

1. Politicians in the Florida Keys need to know if theresidents of Key Largo think the amount of moneycharged for water is reasonable. The politicians decide toconduct a survey of a sample of Key Largo’s residents.Which procedure would be most appropriate for a sampleof Key Largo’s residents?

a. Survey all water customers who pay their water bills atKey Largo City Hall on the third day of the month.

b. Survey a random sample of executives who work for thewater company in Key Largo.

c. Survey 5000 individuals who are randomly selectedfrom a list of all people living in Georgia and Florida.

d. Survey a random sample of persons within eachneighborhood of Key Largo.

Use these scores on a ten-point quiz to solve Exercises 2–4.

8, 5, 3, 6, 5, 10, 6, 9, 4, 5, 7, 9, 7, 4, 8, 8

2. Construct a frequency distribution for the data.

3. Construct a histogram for the data.

4. Construct a frequency polygon for the data.

Use the 30 test scores listed below to solve Exercises 5–6.

79 51 67 50 78

62 89 83 73 80

88 48 60 71 79

89 63 55 93 71

41 81 46 50 61

59 50 90 75 61

1999 2003

1.7%

2.2%

Percentage of Home-SchooledStudents in the United States

Source: National Center for Education Statistics

Use the six data items listed below to solve Exercises 8–11.

3, 6, 2, 1, 7, 3

8. Find the mean. 9. Find the median.

10. Find the midrange. 11. Find the standard deviation.

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 745

Page 68: Ch12 Statistics

746 CHAPTER 12 Statistics

Use the frequency distribution shown to solve Exercises 12–14.

Score Frequency x

1 3

2 5

3 2

4 2

12. Find the mean.

13. Find the median.

14. Find the mode.

15. The annual salaries of four salespeople and the owner of abookstore are

$17,500, $19,000, $22,000, $27,500, $98,500.

Is the mean or the median more representative of the fiveannual salaries? Briefly explain your answer.

According to the American Freshman, the number of hoursthat college freshmen spend studying each week is normallydistributed with a mean of 7 hours and a standard deviation of5.3 hours. In Exercises 16–17, use the 68–95–99.7 Rule to findthe percentage of college freshmen who study

16. between 7 and 12.3 hours each week.

17. more than 17.6 hours each week.

18. IQ scores are normally distributed in the population. Whohas a higher IQ: a student with a 120 IQ on a scale where100 is the mean and 10 is the standard deviation, or a pro-fessor with a 128 IQ on a scale where 100 is the mean and15 is the standard deviation? Briefly explain your answer.

19. Use the z-scores and the corresponding percentiles shownbelow to solve this exercise. Test scores are normallydistributed with a mean of 74 and a standard deviation of10. What percentage of the scores are above 88?

z-Score Percentile

1.1 86.43

1.2 88.49

1.3 90.32

1.4 91.92

1.5 93.32

20. Use the percentiles in the table shown below to find thepercentage of scores between 630 and 690.

Score Percentile

780 99

750 87

720 72

690 49

660 26

630 8

600 1

f

21. Using a random sample of 100 students from a campus ofapproximately 12,000 students, 60% of the students in thesample said they were very satisfied with their professors.

a. Find the margin of error in this percent.

b. Write a statement about the percentage of the entirepopulation of students from this campus who are verysatisfied with their professors.

22. Make a scatter plot for the given data. Use the scatter plot todescribe whether or not the variables appear to be related.

x 1 4 3 5 2

y 5 2 2 1 4

The scatter plot shows the number of minutes each of 16 peopleexercise per week and the number of headaches per month eachperson experiences. Use the scatter plot to determine whethereach of the statements in Exercises 23–25 is true or false.

23. An increase in the number of minutes devoted to exercisecauses a decrease in headaches.

24. There is a perfect negative correlation between time spentexercising and number of headaches.

25. The person who exercised most per week had the leastnumber of headaches per month.

26. Is the relationship between the price of gas and thenumber of people visiting our national parks a positivecorrelation, a negative correlation, or is there no correla-tion? Explain your answer.

9876543210

20 40 60

Minutes per Week Spent Exercising

Num

ber

of H

eada

ches

per

Mon

th

80 100 120

y

x

BLITMC12_679-746-v3 10/26/06 4:21 PM Page 746