distribution of data and the empirical rule -...

16
Distribution of Data and the Empirical Rule 1 Stem-and-Leaf Diagrams Although the mean, the median, the mode, and the standard deviation provide some information about a set of data and the distribution of the data, it is often helpful to use graphical procedures that visually illustrate precisely how the val- ues in a set of data are distributed. Many small sets of data can be graphically displayed by using a stem-and- leaf diagram. For instance, consider the following history test scores: 65, 72, 96, 86, 43, 61, 75, 86, 49, 68, 98, 74, 84, 78, 85, 75, 86, 73 In this form the data are called raw data because the data have not been or- ganized. With raw data it is generally difficult to observe how the data are distrib- uted. In the stem-and-leaf diagram shown at the left, we have organized the test scores by placing all the scores that are in the 40s in the top row, the scores that are in the 50s in the second row, the scores that are in the 60s in the third row, and so on. The tens digits of the scores have been placed to the left of the vertical line. In this diagram they are referred to as stems. The ones digits of the test scores have been placed in the proper row to the right of the vertical line. In this diagram they are the leaves. It is now easy to make observations about the distribution of the scores. Only two of the scores are in the 90s, six of the scores are in the 70s, and none of the scores are in the 50s. The lowest score is 43 and the highest is 98. Steps in the Construction of a Stem-and-Leaf Diagram 1. Determine the stems and list the stems in a column from smallest to largest. 2. List the remaining digits of each stem as a leaf to the right of its stem. 3. Include a legend that explains the meaning of the stem and the leaves. Include a title for the diagram. The choice of how many leading digits to use as the stem will depend on the particular application and can be best explained with an example. EXAMPLE 1 Construct a Stem-and-Leaf Diagram A travel agent has recorded the amount spent by customers for a cruise. Construct a stem-and-leaf diagram for the data. Continued Distribution of Data and the Empirical Rule Stem-and-Leaf Diagrams Frequency Distributions and Histograms Normal Distributions and the Empirical Rule z-Scores Copyright © Houghton Mifflin Company. All rights reserved. $3600 $4700 $7200 $2100 $5700 $4400 $9400 $6200 $5900 $2100 $4100 $5200 $7300 $6200 $3800 $4900 $5400 $5400 $3100 $3100 $4500 $4500 $2900 $3700 $3700 $4800 $4800 $2400 Amount Spent for a Cruise, Summer of 2003 Stems Leaves 4 3 9 5 6 1 5 8 7 2 3 4 5 5 8 8 4 5 6 6 6 9 6 8 A Stem-and-Leaf Diagram of a Set of History Test Scores Legend: 8/6 represents 86 302360_File_B.qxd 7/7/03 7:18 AM Page 1

Upload: vunga

Post on 07-Mar-2018

230 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

Distribution of Data and the Empirical Rule 1

� Stem-and-Leaf DiagramsAlthough the mean, the median, the mode, and the standard deviation providesome information about a set of data and the distribution of the data, it is oftenhelpful to use graphical procedures that visually illustrate precisely how the val-ues in a set of data are distributed.

Many small sets of data can be graphically displayed by using a stem-and-leaf diagram. For instance, consider the following history test scores:

65, 72, 96, 86, 43, 61, 75, 86, 49, 68, 98, 74, 84, 78, 85, 75, 86, 73

In this form the data are called raw data because the data have not been or-ganized. With raw data it is generally difficult to observe how the data are distrib-uted. In the stem-and-leaf diagram shown at the left, we have organized the testscores by placing all the scores that are in the 40s in the top row, the scores that arein the 50s in the second row, the scores that are in the 60s in the third row, and soon. The tens digits of the scores have been placed to the left of the vertical line. Inthis diagram they are referred to as stems. The ones digits of the test scores havebeen placed in the proper row to the right of the vertical line. In this diagram theyare the leaves. It is now easy to make observations about the distribution of thescores. Only two of the scores are in the 90s, six of the scores are in the 70s, andnone of the scores are in the 50s. The lowest score is 43 and the highest is 98.

Steps in the Construction of a Stem-and-Leaf Diagram1. Determine the stems and list the stems in a column from smallest to largest.2. List the remaining digits of each stem as a leaf to the right of its stem.3. Include a legend that explains the meaning of the stem and the leaves. Include a

title for the diagram.

The choice of how many leading digits to use as the stem will depend on theparticular application and can be best explained with an example.

EXAMPLE 1 Construct a Stem-and-Leaf Diagram

A travel agent has recorded the amount spent by customers for a cruise. Constructa stem-and-leaf diagram for the data.

Continued ➤

Distribution of Data and the Empirical Rule� Stem-and-Leaf Diagrams� Frequency Distributions and

Histograms� Normal Distributions and

the Empirical Rule� z-Scores

Copyright © Houghton Mifflin Company. All rights reserved.

$3600 $4700 $7200 $2100 $5700 $4400 $9400

$6200 $5900 $2100 $4100 $5200 $7300 $6200

$3800 $4900 $5400 $5400 $3100 $3100 $4500

$4500 $2900 $3700 $3700 $4800 $4800 $2400

Amount Spent for a Cruise, Summer of 2003

Stems Leaves

4 3 9

5

6 1 5 8

7 2 3 4 5 5 8

8 4 5 6 6 6

9 6 8

A Stem-and-Leaf Diagram ofa Set of History Test Scores

Legend: 8/6 represents 86

302360_File_B.qxd 7/7/03 7:18 AM Page 1

Page 2: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

Solution One method of choosing the stems is to let each thousands digit be astem and each hundreds digit be a leaf. If the stems and leaves are assigned in thismanner, then the notation , which has a stem of 2 and a leaf of 1, represents acost of $2100 and the notation represents a cost of $5400. The diagram can nowbe constructed by writing all of the stems, from smallest to largest, in a column tothe left of a vertical line and writing the corresponding leaves to the right of thevertical line.

The following table lists the ages of the customerswho purchased a cruise. Construct a stem-and-leaf diagram for the data.

Solution See page S1.

Sometimes two sets of data can be compared by using a back-to-back stem-and-leaf diagram, which has common stems with leaves from one data set dis-played to the right of the stems and leaves from the other data set displayed to theleft of the stems. For instance, the following back-to-back stem-and-leaf diagramshows the test scores for two biology classes that took the same test.

CHECK YOUR PROGRESS 1

5�42�1

2

Copyright © Houghton Mifflin Company. All rights reserved.

Stems Leaves

2 1 1 4 9

3 1 1 6 7 7 8

4 1 4 5 5 7 8 8 9

5 2 4 4 7 9

6 2 2

7 2 3

8

9 4

Amount Spent for a Cruise

Legend: represents $73007�3

32 45 66 21 62 68

61 55 23 38 44 77

46 50 33 35 42 45

51 28 40 41 52 52

72 64 51 33

Ages of Customers WhoPurchased a Cruise

302360_File_B.qxd 7/7/03 7:18 AM Page 2

Page 3: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

Biology Test Scores

Legend: represents 73 Legend: represents 82

QUESTION Which biology class did better on the test?

� Frequency Distributions and HistogramsLarge sets of data are often displayed using a frequency distribution or a his-togram. For example, consider the following situation. An Internet serviceprovider (ISP) has installed new computers. To estimate the new download timesits subscribers will experience, the ISP surveyed 1000 subscribers to determine thetime each subscriber required to download a particular file from the Internet sitemusic.net. The results of that survey are summarized in the following table.

A grouped frequency distribution A histogram of the frequency distribution atthe left

The above table is called a grouped frequency distribution. It shows how of-ten (frequently) certain events occurred. Each interval 0–10, 10–20, . . . is called a

10 20 30 40 50 600

Download time, in seconds

Num

ber

of s

ubsc

ribe

rs

100

200

300

400

50

150

250

350

8�23�7

Distribution of Data and the Empirical Rule 3

Copyright © Houghton Mifflin Company. All rights reserved.

ANSWER The 8 A.M. class did better on the test because it had more scores in the 80sand 90s and fewer scores in the 40s, 50s, and 60s. The scores in the 70s weresimilar for both classes.

Download time Number of(in seconds) subscribers

0–10 28

10–20 129

20–30 355

30–40 345

40–50 121

50–60 22

8 A.M. class 10 A.M. class

2 4 5 8

7 5 6 7 9 9

5 8 6 2 3 4 8

1 2 3 3 3 7 8 7 1 3 3 5 5 6 8

4 4 5 5 6 8 8 9 8 2 3 6 6 6

2 4 5 5 8 9 4 5

302360_File_B.qxd 7/7/03 7:18 AM Page 3

Page 4: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

class. This distribution has six classes. For the 10–20 class, 10 is the lower classboundary and 20 is the upper class boundary. Any data value that lies on a com-mon boundary is assigned to the higher class. The graph of a frequency distribu-tion is called a histogram. A histogram provides a pictorial view of how the dataare distributed. In the above histogram, the height of each bar indicates how manysubscribers experienced the download times indicated by the class representedbelow on the horizontal axis. The center point of a class is called a class mark. Inthe above histogram, the class marks 5, 15, 25, 35, 45, 55 are shown by the red tickmarks on the horizontal axis.

Instead of using classes with a width of 10 seconds, the ISP could have chosena smaller class width. A smaller class width produces more classes. For instance, ifeach class width were 5 seconds, the frequency distribution and histogram for themusic.net example would have the form shown below.

A frequency distribution with 12 classes A histogram of the frequency distribution atthe left

Examine the following distribution. It shows the percent of subscribers whoare in each class, as opposed to the frequency distribution above, which shows thenumber of subscribers in each class. The type of frequency distribution that liststhe percent of data in each class is called a relative frequency distribution. Therelative frequency histogram shown at the right below was drawn by using thedata in the relative frequency distribution. It shows the percent of subscribersalong its vertical axis.

5 10 15 20 25 30 35 40 45 50 55 600

Download time, in seconds

Num

ber

of s

ubsc

ribe

rs

25

50

75

100

125

150

175

200

4

Copyright © Houghton Mifflin Company. All rights reserved.

Download time Number of(in seconds) subscribers

0–5 8

5–10 20

10–15 40

15–20 89

20–25 155

25–30 200

30–35 196

35–40 149

40–45 76

45–50 45

50–55 14

55–60 8

302360_File_B.qxd 7/7/03 7:18 AM Page 4

Page 5: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

A relative frequency distribution A relative frequency histogram

One advantage of using a relative frequency distribution instead of a fre-quency distribution is that there is a direct correspondence between the percent ofthe data that lie in a particular portion of the relative frequency distribution andprobability. For instance, in the relative frequency distribution above, the percentof the data that lie between 35 and 40 seconds is 14.9%. Thus, if a subscriber ischosen at random, the probability that the subscriber will require between 35 and40 seconds to download the music file is 0.149.

EXAMPLE 2 Use a Relative Frequency Distribution

Use the music.net relative frequency distribution above to determine

a. the percent of subscribers who required at least 25 seconds to download the file.b. the probability that a subscriber chosen at random will require from 5 to

20 seconds to download the file.

Solutiona. The percent of data in all classes with a lower bound of 25 seconds or more is

the sum of the percents for all of the classes highlighted in red in the distribu-tion at the left. The percent of subscribers who required at least 25 seconds todownload the file is 68.8%.

b. The percent of data in all classes with a lower bound of at least 5 seconds andan upper bound of 20 seconds or less is the sum of the percents for all of theclasses highlighted in blue in the distribution at the left. Thus the percent ofsubscribers who required from 5 to 20 seconds to download the file is 14.9%.The probability that a subscriber chosen at random will require from 5 to20 seconds to download the file is 0.149. Continued ➤

5 10 15 20 25 30 35 40 45 50 55 600

Download time, in seconds

Perc

ent o

f sub

scri

bers

5

10

15

20

Distribution of Data and the Empirical Rule 5

Copyright © Houghton Mifflin Company. All rights reserved.

Download time Number of(in seconds) subscribers

0–5 0.8

5–10 2.0

10–15 4.0

15–20 8.9

20–25 15.5

25–30 20.0

30–35 19.6

35–40 14.9

40–45 7.6

45–50 4.5

50–55 1.4

55–60 0.8

Download time Percent of(in seconds) subscribers

0–5 0.8

5–10 2.0

10–15 4.0

15–20 8.9

20–25 15.5

25–30 20.0

30–35 19.6

35–40 14.9

40–45 7.6

45–50 4.5

50–55 1.4

55–60 0.8

Sum is14.9%

Sum is68.8%

302360_File_B.qxd 7/7/03 7:18 AM Page 5

Page 6: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

Use the relative frequency distribution below todetermine

a. the percent of the states that pay an average teacher salary of at least $45,000.b. the probability that a state selected at random pays an average teacher salary

of at least $30,000 but less than $39,000.

Average Salaries of Public School Teachers, 1998–1999

Source: www.nea.org.

Solution See page S1.

There is a geometric analogy between the percents of data and probabilities we cal-culated in Example 2 and the relative frequency histogram for the data. For in-stance, the percent of data described in part a. of Example 2 corresponds to thearea shown by the red bars in the histogram on the left below. The percent of datadescribed in part b. corresponds to the area shown by the blue bars in the his-togram on the right below.

� Normal Distributions and the Empirical RuleA histogram for a set of data provides us with a tool that can indicate patterns ortrends in the distribution of data. The terms uniform, skewed, symmetrical, and nor-mal are used to describe the distributions of some sets of data.

5 10 15 20 25 30 35 40 45 50 55 600

Download time, in seconds

At least 5 but less than 20 seconds

Perc

ent o

f sub

scri

bers

5

10

15

20

5 10 15 20 25 30 35 40 45 50 55 600

Download time, in seconds

25 seconds or more

Perc

ent o

f sub

scri

bers

5

10

15

20

CHECK YOUR PROGRESS 2

6

Copyright © Houghton Mifflin Company. All rights reserved.

Average Salary, s Number of States Relative Frequency

3 6%

7 14%

12 24%

9 18%

6 12%

3 6%

5 10%

3 6%

2 4%$51,000 � s � $54,000

$48,000 � s � $51,000

$45,000 � s � $48,000

$42,000 � s � $45,000

$39,000 � s � $42,000

$36,000 � s � $39,000

$33,000 � s � $36,000

$30,000 � s � $33,000

$27,000 � s � $30,000

302360_File_B.qxd 7/7/03 7:18 AM Page 6

Page 7: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

A uniform distribution, shown in the figure below, is generated when all ofthe observed events occur with the same frequency. The graph of a uniform distri-bution remains at the same height over the range of the data. Some randomprocesses produce distributions that are uniform or nearly uniform. For example,if the spinner below is used to generate numbers, then in the long run each of thenumbers 1, 2, 3, . . . , 8 will be generated with approximately the same frequency.

A symmetrical distribution, shown at the left, is symmetrical about a verticalcenter line. If you fold a symmetrical distribution along the center line, the rightside of the distribution will match the left side. The following data sets are exam-ples of distributions that are nearly symmetrical: the weights of all male students,the heights of all teenage females, the prices of a gallon of regular gasoline in alarge city, the mileages for a particular type of automobile tire, and the amounts ofsoda dispensed by a vending machine. In a symmetrical distribution, the mean,the median, and the mode are all equal and they are located at the center of thedistribution.

Skewed distributions, shown in the figures below, have a longer tail on oneside of the distribution and shorter tail on the other side. A distribution is skewedto the left if it has a longer tail on the left and is skewed to the right if it has a longertail on the right. In a distribution that is skewed to the left, the mean is less thanthe median, which is less than the mode. In a distribution that is skewed to theright, the mode is less than the median, which is less than the mean.

Many examinations yield test scores that have skewed distributions. For in-stance, if a test designed for students in the sixth grade is given to students in aninth grade class, most of the scores will be high, and the distribution of the testscores will be skewed to the left.

Discrete values are separated from each other by an increment, or “space.”For example, only whole numbers are used to record the number of points a

median medianmode meanmean mode

Skewed left Skewed right

Freq

uenc

y of

x

x

Skewed distributions

Freq

uenc

y of

x

x

Uniform distribution Random number generator

1

3

6

8

4

7

2

5

1 2 3 4 5

Freq

uenc

y of

x

6 7 8 x

Distribution of Data and the Empirical Rule 7

Copyright © Houghton Mifflin Company. All rights reserved.

mean = median = mode

Center lineSymmetrical distribution

Freq

uenc

y of

x

x

302360_File_B.qxd 7/7/03 7:18 AM Page 7

Page 8: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

basketball player scores in a game. The possible numbers of points s that theplayer can score are restricted to the discrete values 0, 1, 2, 3, 4, . . . . The variable sis a discrete variable. Different scores are separated from each other by at least 1point. Any variable that is based on counting procedures is a discrete variable.Histograms are generally used to show the distribution of discrete variables.

Continuous values are values that can take on all real numbers in some inter-val. For example, the possible times that it takes to drive to the grocery store rep-resent a continuous value. The time is not restricted to natural numbers such as4 minutes or 5 minutes. In fact, the time may be any part of a minute, or of a sec-ond if we care to measure that precisely. A variable such as time that is based onmeasuring with smaller and smaller units is a continuous variable. Continuouscurves, rather than histograms, are used to show the distributions of continuousvariables.

In some cases a continuous curve is used to display the distribution of a set ofdiscrete data. For instance, when we have a large set of data and the class inter-vals are very small, the shape of the top of the histogram approaches a smoothcurve. See the two figures below. Thus, when graphing the distribution of verylarge sets of data with very small class intervals, it is common practice to replacethe histogram with a smooth continuous curve.

One of the most important statistical distributions is known as a normal distri-bution. The precise mathematical definition of a normal distribution is given bythe equation in the Take Note at the left; however, for many problems it is suffi-cient to know that all normal distributions have the following properties.

A histogram for discrete data

x

A continuous distribution curve

x

f (x)f (x)

Distributions of continuous variables

a. Bimodalt

f (t)

b. Skewed rightx

f (x)

c. Symmetricalw

f (w)

8

Copyright © Houghton Mifflin Company. All rights reserved.

If x is a continuous variable withmean (the Greek letter mu)and standard deviation , thenits normal distribution is given by

f �x� �e��1

2��x��� �2

��2�

��

302360_File_B.qxd 7/7/03 7:18 AM Page 8

Page 9: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

13.6%

34.1%

Data within 2 ofData within 2 of

34.1%

13.6%

µ

µ

µ95.4%

µ − 2σ + 2σ x

f (x) σ

Properties of a Normal DistributionA normal distribution has a bell shape that is symmetric about a vertical line throughits center. The mean, the median, and the mode of a normal distribution are all equaland they are located at the center of the distribution.

The Empirical Rule: In a normal distribution, about68.2% of the data lies within 1 standard deviation of the mean.95.4% of the data lies within 2 standard deviations of the mean.99.7% of the data lies within 3 standard deviations of the mean.

The Empirical Rule can be used to solve many problems that involve a nor-mal distribution.

EXAMPLE 3 Use the Empirical Rule

A survey of 1000 U.S. gas stations found that the pricecharged for a gallon of regular gas can be closely ap-proximated by a normal distribution with a mean of$1.90 and a standard deviation of $0.20. How many ofthe stations charge

a. between $1.50 and $2.30 for a gallon of regular gas?b. less than $2.10 for a gallon of regular gas?c. more than $2.30 for a gallon of regular gas?

Solutiona. The $1.50 per gallon price is 2 standard deviations below the mean. The $2.30

price is 2 standard deviations above the mean. In a normal distribution,95.4% of all data lies within 2 standard deviations of the mean. (See the nor-mal distribution at the left.) Therefore, approximately

of the stations charge between $1.50 and $2.30 for a gallon of regular gas.Continued ➤

�95.4%��1000� � �0.954��1000� � 954

A normal distribution

2.15% 2.15%13.6% 34.1% 13.6%34.1%

68.2% of the data95.4% of the data99.7% of the data

−µ σx

f (x)

3 −µ σ2 +µ σ2 +µ σ3−µ µσ +µ σ

Distribution of Data and the Empirical Rule 9

Copyright © Houghton Mifflin Company. All rights reserved.

302360_File_B.qxd 7/7/03 7:18 AM Page 9

Page 10: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

b. The $2.10 price is 1 standard deviation above the mean. (See the normal dis-tribution at the left.) In a normal distribution, 34.1% of all data lies betweenthe mean and 1 standard deviation above the mean. Thus, approximately

of the stations charge between $1.90 and $2.10 for a gallon of regular gasoline.Half of the stations charge less than the mean. Therefore, about

of the stations charge less that $2.10 for a gallon of regular gas. Thisproblem can also be solved by computing of 1000.

c. The $2.30 price is 2 standard deviations above the mean. In a normal distri-bution, 95.4% of all data is within 2 standard deviations of the mean. Thismeans that the other 4.6% of the data will lie either more than 2 standard de-viations above the mean or less than 2 standard deviations below the mean.We are only interested in the data that lie more than 2 standard deviationsabove the mean, which is of 4.6%, or 2.3%, of the data. (See the distributionat the left.) Thus about of the stationscharge more than $2.30 for a gallon of regular gas.

A vegetable distributor knows that during the monthof August, the weights of its tomatoes were normally distributed with a mean of0.61 pound and a standard deviation of 0.15 pound.

a. What percent of the tomatoes weighed less than 0.76 pound?b. In a shipment of 6000 tomatoes, how many tomatoes can be expected to

weigh more than 0.31 pound?c. In a shipment of 4500 tomatoes, how many tomatoes can be expected to

weigh between 0.31 and 0.91 pound?

Solution See page S1.

� z-ScoresWhen you take a test, it is natural to wonder how you will do compared to theother students in the class. Will you finish in the top 10%, or will you be closer tothe middle? One statistic that is used to measure the position of a data value withrespect to other data values is known as the z-score.

z-ScoreThe z-score for a given data value x is the number of standard deviations between xand the mean of the data. The following formulas are used to calculate the z-score fora data value x.

Population: Sample:

In the next example, we use a student’s z-scores for two tests to determinehow well the student did on each test in comparison to the other students.

zx �x � x

szx �

x � �

CHECK YOUR PROGRESS 3

�2.3%��1000� � �0.023��1000� � 23

12

34.1% � 50% � 84.1%500 � 841

341 �

�34.1%��1000� � �0.341��1000� � 341

10

Copyright © Houghton Mifflin Company. All rights reserved.

+µ µ σ

34.1%

Data less than 1 above

50%

84.1% ofthe data

x

f (x) µσ

2.3%2.3%

Data more than 2 above

95.4%

x

f (x)

+µ σ2σ2−µ µ

µσ

302360_File_B.qxd 7/7/03 7:18 AM Page 10

Page 11: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

EXAMPLE 4 Use z-Scores

a. Ruben has taken two tests in his math class. He scored 72 on the first test, forwhich the mean was 65 and the standard deviation was 8. He received a 60on the second test, for which the mean was 45 and the standard deviationwas 12. In comparison to the other students, did Ruben do better on the firstor the second test?

b. Stacy is in the same math class as Ruben. Stacy’s z-score for the first test was. What was Stacy’s score on the first test?

Solutiona. The z-score formula yields and Thus

Ruben scored 0.875 standard deviations above the mean on his first test and1.25 standard deviations above the mean on the second test. In comparison tohis classmates, Ruben scored better on the second test than on the first test.

b. Substitute into the z-score formula and score for x.

Stacy’s score on the first test was 59.

a. Cheryl took two quizzes in her history class. She scored 15 on the first quiz,for which the mean was 12 and the standard deviation was 2.4. Her score onthe second quiz, for which the mean was 11.5 and the standard deviation was2.2, was 14. In comparison to her classmates, did Cheryl do better on the firstor the second quiz?

b. Greg is in the same history class as Cheryl. Greg’s z-score for the first quizwas . What was Greg’s score on the first quiz?

Solution See page S1.

Topics for Discussion1. Is it possible, in a normal distribution of data, for the mean to be much larger

than the median? Explain.

2. Must all large data sets have a normal distribution? Explain.

3. A professor gave a final examination to 110 students. Eighteen students hadexamination scores that were more than one standard deviation above themean. Does this indicate that 18 of the students had examination scores thatwere less than one standard deviation below the mean? Explain.

4. A set of data consists of the 525 monthly salaries, listed in dollars, of the em-ployees of a large company. What units should be used for the z-scores associ-ated with the salaries? Explain.

�2.5

CHECK YOUR PROGRESS 4

x � 59 �6 � x � 65

�0.75 �x � 65

8

z60 � 60 � 4512 � 1.25.z72 � 72 � 65

8 � 0.875

�0.75

Distribution of Data and the Empirical Rule 11

Copyright © Houghton Mifflin Company. All rights reserved.

In any application, the quantityand the standard deviation

are both measured in the sameunits. Thus a z-score, which isthe quotient of and , is adimensionless measure.

�x � �

�x � �

302360_File_B.qxd 7/7/03 7:18 AM Page 11

Page 12: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

12

In Exercises 1 to 8, determine whether the given statementis true or false.

1. If a distribution is symmetric about a vertical line, then itis a normal distribution.

2. Every normal distribution has a bell-shaped graph.

3. In a normal distribution, the mean, the median, and themode of the distribution all are located at the center ofthe distribution.

4. In a distribution that is skewed to the left, the median ofthe data is greater than the mean.

5. If a z-score for a data value x is negative, then x must alsobe negative.

6. In every data set, 68.2% of the data lies within 1 standarddeviation of the mean.

7. Let x be the number of people who attend a baseballgame. The variable x is a discrete variable.

8. The time of day d in the lobby of a bank is measured witha digital clock. The variable d is a continuous variable.

In Exercises 9 and 10, use the Empirical Rule to answer eachquestion.

9. In a normal distribution, what percent of the data lies

a. within 2 standard deviations of the mean?

b. more than 1 standard deviation above the mean?

c. between 1 standard deviation below the mean and2 standard deviations above the mean?

10. In a normal distribution, what percent of the data lies

a. within 3 standard deviations of the mean?

b. less than 2 standard deviations below the mean?

c. between 2 standard deviations below the mean and3 standard deviations above the mean?

Business and Economics11. State Sales Tax Rates Use the following frequency

distribution to determine

a. the percent of states in the U.S. that had a 2001 sales taxof at least 5%.

b. the probability that a state selected at random had a2001 sales tax rate of at least 3% but less than 5%.

2001 State Sales Tax Rate

Source: Time Almanac 2002

12. Waiting Time The amount of time customers spendwaiting in line at a bank is normally distributed, with a mean of 3.5 minutes and a standard deviation of0.75 minute. Find the probability that the time a customerwill spend waiting is

a. at most 2.75 minutes.

b. less than 2 minutes.

13. Weights of Parcels During a particular week, anovernight delivery company found that the weights of its parcels were normally distributed, with a mean of24 ounces and a standard deviation of 6 ounces.

a. What percent of the parcels weighed between12 ounces and 30 ounces?

b. What percent of the parcels weighed more than42 ounces?

E X E R C I S E S

Number RelativeTax rate, r of states frequency

5 10%

0 0%

1 2%

0 0%

13 26%

15 30%

13 26%

3 6%7% � r � 8%

6% � r � 7%

5% � r � 6%

4% � r � 5%

3% � r � 4%

2% � r � 3%

1% � r � 2%

0% � r � 1%

Copyright © Houghton Mifflin Company. All rights reserved.

302360_File_B.qxd 7/7/03 7:18 AM Page 12

Page 13: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

Distribution of Data and the Empirical Rule 13

Copyright © Houghton Mifflin Company. All rights reserved.

14. Weights of Boxes of Corn Flakes The weights of the boxesof corn flakes filled by a machine are normallydistributed, with an average weight of 14.5 ounces and a standard deviation of 0.5 ounce. What percent of the boxes

a. weigh less than 14.0 ounces?

b. weigh between 13.5 and 15.0 ounces?

15. Duration of Long Distance Telephone Calls A telephonecompany has found that the lengths of its long distancetelephone calls are normally distributed, with a mean of225 seconds and a standard deviation of 55 seconds.What percent of its long distance calls last

a. more than 335 seconds?

b. between 170 and 390 seconds?

Life and Health Sciences16. Median Income for Physicians The 1995 median

income for physicians was $160,000. (Source: AMACenter for Health Policy Research) The distribution ofthese incomes is skewed to the right. Is the mean of theseincomes greater than or less than $160,000?

17. Heights of Women A survey of 1000 women aged 20 to 30found that their heights are normally distributed, with amean of 65 inches and a standard deviation of 2.5 inches.

a. How many of the women have a height that is within1 standard deviation of the mean?

b. How many of the women have a height that is be-tween 60 inches and 70 inches?

18. Distribution of Data Consider the set of the heightsof all babies born in the United States during a

particular year. Do you think this data set can be closelyapproximated by a normal distribution? Explain.

Social Sciences19. Presidential

Inauguration Agesand Ages at Death Thetable in Exercise 26 ofSection 8.4 lists the U.S.presidents and theirages at inauguration.The table in Exercise 27of Section 8.4 lists thedeceased U.S. presidents as of December 2002, andtheir ages at death.

a. Construct a back-to-back stem-and-leaf diagram forthe data in the tables.

b. What patterns, if any, are evident from the diagram?

20. Average Salaries of Teachers Use the followingfrequency distribution to determine

a. the percent of states in the U.S. that paid a 1998–1999average teacher salary of at least $39,000.

b. the probability that a state selected at random paid a1998–1999 average teacher salary of at least $36,000but less than $45,000.

Average Salaries of Public School Teachers,1998–1999

Source: www.nea.org.

Number RelativeAverage salary, s of states frequency

3 6%

7 14%

12 24%

9 18%

6 12%

3 6%

5 10%

3 6%

2 4%$51,000 � s � $54,000

$48,000 � s � $51,000

$45,000 � s � $48,000

$42,000 � s � $45,000

$39,000 � s � $42,000

$36,000 � s � $39,000

$33,000 � s � $36,000

$30,000 � s � $33,000

$27,000 � s � $30,000

Marshall/Liaison/Getty Images

302360_File_B.qxd 7/7/03 7:18 AM Page 13

Page 14: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

14

Copyright © Houghton Mifflin Company. All rights reserved.

21. Test Scores The following relative frequency histogramshows the distribution of test scores for 50 students whotook a history test.

a. What percent of the students scored at least 76 on the test?

b. How many of the students received a score of at least60 but less than 84?

22. Examination Duration Times At a university, 500 lawstudents took an examination. One student completedthe exam in 24 minutes. The mode for the completiontime is 50 minutes. The distribution of the times thestudents took to complete the exam is skewed to the left.Is the mean of these times greater than or less than50 minutes?

23. Intelligence Quotients A psychologist finds that theintelligence quotients of a group of patients are normallydistributed, with a mean of 104 and a standard deviationof 26. Find the percent of the patients with IQs

a. above 130.

b. between 130 and 182.

24. Distribution of Data The population of a resort cityconsists mostly of wealthy families and families

with low incomes. Do you think the set of family incomesfor this city can be closely approximated by a normaldistribution? Explain.

25. Comparison of Quiz Scores Ryan took two quizzes in hisart class. He scored 45 on the first quiz, for which themean was 51.4 and the standard deviation was 9.5. Hisscore on the second quiz, for which the mean was 53.6and the standard deviation was 7.2, was 49. In comparison to his classmates, did Ryan do better on the first or thesecond quiz?

26. Comparison of Test Scores Tanya took two tests in herchemistry class. She scored 85 on the first test, for whichthe mean was 79.4 and the standard deviation was 6.4.Her score on the second test, for which the mean was 70.5 and the standard deviation was 5.3, was 78. Incomparison to her classmates, did Tanya do better on thefirst or the second test?

Sports and Recreation27. Super Bowl Scores The following table lists the

winning and losing scores for all of the Super Bowlgames up to the year 2001.

Super Bowl Results, 1967–2001

a. Construct a back-to-back stem-and-leaf diagram forthe winning scores and the losing scores.

b. What patterns, if any, are evident from the back-to-back stem-and-leaf diagram?

28. Ironman Triathlon The following table lists thewinning times for the men’s and women’s Ironman

Triathlon World Championships, held in Kailua-Kona,Hawaii. (Source: http://www.3athlon.org/races/ironman/hawaii2001/statistik/index.php)

28

5%

0%

15%

10%

20%

25%

36 44 52 60 68 76 84 92 100

Test scores

Rel

ativ

e fr

eque

ncy

35–10 24–7 27–10 42–10 49–26

33–14 16–6 26–21 20–16 27–17

16–7 21–17 27–17 55–10 35–21

23–7 32–14 38–9 20–19 31–24

16–13 27–10 38–16 37–24 34–19

24–3 35–31 46–10 52–17 23–16

14–7 31–19 39–20 30–13 34–7

AP/WideWorldPhotos

302360_File_B.qxd 7/7/03 7:18 AM Page 14

Page 15: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

Distribution of Data and the Empirical Rule 15

Copyright © Houghton Mifflin Company. All rights reserved.

Ironman Triathlon World Championships (Winning timesrounded to the nearest minute)

a. Construct a back-to-back stem-and-leaf diagram for the data in the tables. Hint: Use the two-digit “minutes” as your leaves, and insert a comma between the leaves in each row so that they can be easily distinguished from each other.

b. What patterns, if any, are evident from the back-to-back stem-and-leaf diagram?

29. Home Run Leaders The following tables list thenumbers of home runs hit by the home run leaders

in the National and the American League from 1971 to2001.

Home Run Leaders, 1971–2001

a. Construct a back-to-back stem-and-leaf diagram for the data in the tables.

b. What patterns, if any, are evident from the back-to-back stem-and-leaf diagram?

30. Race Times The following relative frequency histogramshows the distribution of times for the 1200 contestantswho finished a race.

a. What percent of the contestants finished the race inless than 80 seconds?

b. How many contestants had a time of at least 60 sec-onds but less than 80 seconds?

31. Baseball Attendance A baseball franchise finds that theattendance at its home games is normally distributed,with a mean of 16,000 and a standard deviation of 4000.

a. What percent of the home games have an attendancebetween 8000 and 16,000?

b. What percent of the home games have an attendanceof less than 12,000?

Physical Sciences and Engineering32. Breaking Points of Ropes The breaking points of a

particular type of rope are normally distributed, with amean of 350 pounds and a standard deviation of24 pounds. What is the probability that a piece of thisrope chosen at random will have a breaking point of

a. less than 326 pounds?

b. between 302 and 398 pounds?

33. Tire Mileage The mileages of WearEver tires are normallydistributed, with a mean of 48,000 miles and a standarddeviation of 6000 miles. What is the probability that theWearEver tire you purchase will provide a mileage of

a. more than 60,000 miles?

b. between 42,000 and 54,000 miles?

50

4%

0%

12%

8%

16%

20%

24%

60 70 80 90 100 110 120

Time, in seconds

Rel

ativ

e fr

eque

ncy

Men, 1978–2000 Women, 1979–2000

11:47 8:29 8:20 12:55 9:35 9:17

11:16 8:34 8:21 11:21 9:01 9:07

9:25 8:31 8:04 12:01 9:01 9:32

9:38 8:09 8:33 10:54 9:14 9:24

9:08 8:28 8:24 10:44 9:08 9:13

9:06 8:19 8:17 10:25 8:55 9:26

8:54 8:09 8:21 10:25 8:58

8:51 8:08 9:49 9:20

National League

48 40 44 36 38 38 52 40 48 48

31 37 40 36 37 37 49 39 47 40

38 35 46 43 40 47 49 70 65 50 73

American League

33 37 32 32 36 32 39 46 45 41

22 39 39 43 40 40 49 42 36 51

44 43 46 40 50 52 56 56 48 47 52

302360_File_B.qxd 7/7/03 7:18 AM Page 15

Page 16: Distribution of Data and the Empirical Rule - Cengagecollege.cengage.com/.../college_algebra_applied/1e/... · 7/7/2003 · Distribution of Data and the Empirical Rule 1 ... Many

16

Copyright © Houghton Mifflin Company. All rights reserved.

34. Highway Speed of Vehicles A study of 8000 vehicles thatpassed by a highway checkpoint found that their speedswere normally distributed, with a mean of 61 miles perhour and a standard deviation of 7 miles per hour.

a. How many of the vehicles had a speed of more than68 miles per hour?

b. How many of the vehicles had a speed of less than40 miles per hour?

Explorations

Chebyshev’s Theorem The following well-known theorem iscalled Chebyshev’s theorem. It is named after the Russianmathematician Pafnuty Lvovich Chebyshev (1821–1894).Chebyshev’s theorem states that a mathematicalrelationship exists between the spread of data and thestandard deviation of the data. A remarkable property ofChebyshev’s theorem is that it is valid for any set of data.This is unlike the Empirical Rule, which applies only tosets of data that have normal distributions.

Chebyshev’s TheoremThe proportion or percentage of any data set that lieswithin z standard deviations of the mean, where z is anypositive number greater than 1, is at least

Applying Chebyshev’s theorem with yields

This result of means that at least 75% of the data

in any data set must lie within 2 standard deviations of themean of the data set.

1. Use Chebyshev’s theorem to determine the minimumpercentage of data (to the nearest percent) in any data setthat must lie within

a. 1.2 standard deviations of the mean.

b. 2.5 standard deviations of the mean.

c. 3.1 standard deviations of the mean.

2. A new automobile dealership found that during themonth of March, the mean selling price of its cars was$29,200, with a standard deviation of $5100. UseChebyshev’s theorem to determine the minimum per-centage (to the nearest percent) of the dealership’s carsthat have a selling price within

a. 1.5 standard deviations of the mean— that is, between$21,550 and $36,850.

b. 2.8 standard deviations of the mean— that is, between$14,920 and $43,480.

34

� 75%

1 �1z2 � 1 �

122 � 1 �

14

�34

z � 2

1 �1z2

302360_File_B.qxd 7/7/03 7:18 AM Page 16