lesson 2.4.1 quantifying variability relative to the mean 2/students/pdf/lesson_2...deviation from...

9
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT STATWAYSTUDENT HANDOUT Lesson 2.4.1 Quantifying Variability Relative to the Mean STUDENT NAME DATE INTRODUCTION Recall the monthly normal temperature data from St. Louis and San Francisco. The data values are contained in the following table: Monthly Normal Temperatures (°F) for St. Louis and San Francisco Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9 San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7 Here are side-by-side dotplots of the monthly normal temperatures in the two cities. 80 75 70 65 60 55 50 45 40 35 30 St. Louis San Franciso Temperatures Dotplots of Temperatures for St. Louis and San Franciso The measures of variability that you have considered are the range and the interquartile range (IQR). The range is the difference between the maximum value and minimum value. The range does not use all of the specific data values, and it is sensitive to outliers and extreme observations. The IQR is related to the median, because it is based on the values of the first and third quartiles. Remember that the sample mean is the average of the data values. It uses all of the specific values. Your goal is to develop a measure of variability to partner with the sample mean, using all of the specific data values.

Upload: others

Post on 24-Apr-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lesson 2.4.1 Quantifying Variability Relative to the Mean 2/Students/PDF/lesson_2...deviation from the mean = (data value – mean) or ... (or at least away from the center of the

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

STATWAY™ STUDENT HANDOUT

Lesson 2.4.1 Quantifying Variability Relative to the Mean

STUDENT NAME DATE

INTRODUCTION

Recall the monthly normal temperature data from St. Louis and San Francisco. The data values are contained

in the following table:

Monthly Normal Temperatures (°F) for St. Louis and San Francisco

Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec.

St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9

San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7

Here are side-by-side dotplots of the monthly normal temperatures in the two cities.

8075706560555045403530

St. Louis

San Franciso

Temperatures

Dotplots of Temperatures for St. Louis and San Franciso

The measures of variability that you have considered are the range and the interquartile range (IQR). The

range is the difference between the maximum value and minimum value. The range does not use all of the

specific data values, and it is sensitive to outliers and extreme observations. The IQR is related to the

median, because it is based on the values of the first and third quartiles.

Remember that the sample mean is the average of the data values. It uses all of the specific values. Your

goal is to develop a measure of variability to partner with the sample mean, using all of the specific data

values.

Page 2: Lesson 2.4.1 Quantifying Variability Relative to the Mean 2/Students/PDF/lesson_2...deviation from the mean = (data value – mean) or ... (or at least away from the center of the

STATWAY STUDENT HANDOUT | 2

Lesson 2.4.1 Quantifying Variability Relative to the Mean

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

Recall the sample means for the two cities:

St. Louis:

San Francisco:

One measure of variability we look at for individual data values is the deviation from the mean. This is calculated by the formula:

deviation from the mean = (data value – mean) or

1 Determine the deviation from the mean for January in St. Louis. What does this number tell you?

2 Here are the deviations from the mean for every month in St. Louis.

Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec.

-26.8 -22.2 -11 0.6 10 19.3 23.7 21.5 14.1 2.3 -9.9 -22.2

Give a method that you might use to determine a number which represents a typical distance from the

mean. Think about this for 2 or 3 minutes, then discuss it with your group. Make sure to come up with

an answer.

The range and IQR both take an outside-in approach to representing variability. In other words, values in the

extremes of the distribution (or at least away from the center of the distribution) are subtracted to provide a

number that represents the variability in a distribution. Rather than work from the outside in, this lesson

adopts an inside-out approach starting with the sample mean.

Each datapoint has a distance from the center, which in this case is the sample mean. This distance is known

as the deviation from the mean or just deviation. Data values with large deviations contribute more to the

variability in the data set. Values with small deviations do not contribute much to the total variability. These

distances from the sample mean to the data values represent the inside-out (from the center to the data

values) approach to quantifying variability.

Page 3: Lesson 2.4.1 Quantifying Variability Relative to the Mean 2/Students/PDF/lesson_2...deviation from the mean = (data value – mean) or ... (or at least away from the center of the

STATWAY STUDENT HANDOUT | 3

Lesson 2.4.1 Quantifying Variability Relative to the Mean

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

TRY THESE

3 One approach to measuring variability is to combine the information contained in the deviations.

A simple way to combine the deviations is to add them up in a sum. Compute the sum of the

deviations in the table:

St. Louis

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Temp. 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9

Deviation -26.8 -22.2 -11 0.6 10 19.3 23.7 21.5 14.1 2.3 -9.9 -22.2

Notice that the sum of the deviations is very close to 0. In fact, if you kept all of the decimal places in the

calculations and not rounded the values, the sum is exactly 0. This is true for any data set—the sum of the

deviations is 0, because of the way the mean is calculated.

4 What could you do to keep the deviations from cancelling each other out and adding to 0?

5 Here is the same table as before with a new row added. In the new row, the deviations are

squared. Note that the first new entry is (-26.8)2 ≈ 718.2. Fill in the rest of the table by squaring

the remaining deviations.

St. Louis

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Temp. 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9

Deviation -26.8 -22.2 -11 0.6 10 19.3 23.7 21.5 14.1 2.3 -9.9 -22.2

(Deviation)2 718.2

6 Now compute the sum of the squared deviations by totaling the values in the bottom row. Give

the answer below. What are the units of the answer?

Page 4: Lesson 2.4.1 Quantifying Variability Relative to the Mean 2/Students/PDF/lesson_2...deviation from the mean = (data value – mean) or ... (or at least away from the center of the

STATWAY STUDENT HANDOUT | 4

Lesson 2.4.1 Quantifying Variability Relative to the Mean

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

7 The next table contains the monthly normal temperature values for San Francisco. A Complete the table by computing the deviations and squared deviations. Below the table give

the sum of the squared deviations. San Francisco

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Temp. 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7

Deviation

(Deviation)2

B Compare the sum of the squared deviations for St. Louis to those for San Francisco. Explain

how these values represent how the variability in monthly normal temperatures for the two cities differs.

Although the sum of the squared deviations represents the variability in a distribution, it is not a commonly

used statistic. Normally, the sum of the squared deviations is part of the calculation of the sample variance,

which itself is used to determine the sample standard deviation.

Here is the formula for the sample variance, which is frequently denoted by s2.

Note that n represents the sample size, which in this example is equal to 12 for each city.

For St. Louis, the sample variance is

Page 5: Lesson 2.4.1 Quantifying Variability Relative to the Mean 2/Students/PDF/lesson_2...deviation from the mean = (data value – mean) or ... (or at least away from the center of the

STATWAY STUDENT HANDOUT | 5

Lesson 2.4.1 Quantifying Variability Relative to the Mean

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

For San Francisco, the sample variance is

8 We are still one step short of where we want to be. The variance is an average of the squared

deviations. What can we do to turn it into an average of the deviations?

9 When we take the square root of the variance it is called the standard deviation and is denoted s.

Calculate the standard deviation for the temperatures in St. Louis and San Francisco.

10 Write a sentence comparing the variability of monthly average temperatures in St. Louis and San

Francisco using the standard deviation for each city.

NEXT STEPS

Let’s think about what standard deviation tells us about data set. Recall that the standard deviation is an

average of the deviations from the mean. In some sense the standard deviation gives the average distance

of the data values from the sample mean.

Consider the standard deviation of St. Louis, s ≈ 18.15°F. An interpretation of this value is that the

temperature of a typical month in St. Louis differs from the mean temperature by about 18 degrees.

Here are some questions to think about in your small groups.

11 Could the standard deviation of a data set ever be negative?

Page 6: Lesson 2.4.1 Quantifying Variability Relative to the Mean 2/Students/PDF/lesson_2...deviation from the mean = (data value – mean) or ... (or at least away from the center of the

STATWAY STUDENT HANDOUT | 6

Lesson 2.4.1 Quantifying Variability Relative to the Mean

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

12 Could the standard deviation ever be 0?

13 In the Lesson 2.3.1 homework you examined what happened to the IQR when we changed a single

data value. We did this for the temperatures in St. Louis. Here are the temperatures again with

the error in the July temperature.

Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec.

Temperature 29.3 33.9 45.1 56.7 66.1 75.4 97.8 77.6 70.2 58.4 46.2 33.9

Use technology to calculate the sample standard deviation for the monthly temperatures in St. Louis.

How does this compare to the true standard deviation we calculated in class?

Note: Like the sample mean, the sample standard deviation is not resistant to the effects of outliers and

skewing. Very large or small values can have a large impact on the standard deviation.

SUMMARY

We have examined two measures of center (mean and median) and three measures of variability or spread

(range, interquartile range, and standard deviation). In deciding which to use, remember two things.

Mean and standard deviation go together, and median and IQR go together.

Both the mean and standard deviation are strongly impacted by outliers and skewing.

When the data are skewed or contain outliers, we usually use the median and IQR to give a description of

the data. When the data are reasonably symmetric we can use the mean and standard deviation. In

addition, the numbers are never enough; we always look at a graph as well, which can be a dotplot,

histogram, or boxplot.

Page 7: Lesson 2.4.1 Quantifying Variability Relative to the Mean 2/Students/PDF/lesson_2...deviation from the mean = (data value – mean) or ... (or at least away from the center of the

STATWAY STUDENT HANDOUT | 7

Lesson 2.4.1 Quantifying Variability Relative to the Mean

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

TAKE IT HOME

1 Here are the names and ages for the 30 women on the 15th season (2010) of “The Bachelor.”

Bachelorette Age Bachelorette Age

Alli 24 Lindsay 25

Ashley H 26 Lisa M 24

Ashley S 26 Lisa P 27

Britnee 25 Madison 25

Britt 25 Marissa 26

Chantal 28 Meghan 30

Cristy 30 Melissa 32

Emily 24 Michelle 30

J 26 Raichel 29

Jackie 27 Rebecca 30

Jill 28 Renee 28

Keltie 28 Sarah L 25

Kimberly 27 Sarah P 27

Lacey 27 Shawntel 25

Lauren 26 Stacey 26

Report the sample standard deviation for the ages of the women on the show. Include units.

Explain what the sample standard deviation means in context.

Page 8: Lesson 2.4.1 Quantifying Variability Relative to the Mean 2/Students/PDF/lesson_2...deviation from the mean = (data value – mean) or ... (or at least away from the center of the

STATWAY STUDENT HANDOUT | 8

Lesson 2.4.1 Quantifying Variability Relative to the Mean

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

2 Recall the example in Lesson 2.2.1 about the weight gains (in grams) for a sample of six normal

adolescent laboratory rats over a one-month period, accompanied by the weight gains for a

sample of six adolescent rats that were given a high daily dose of a stimulant drug.

Here are the weight gains for the two groups:

Control 169 154 179 202 197 175

Stimulant Group 137 158 153 147 168 147

A Use technology to compute the sample standard deviations for the weight gains in each

group. Be sure to include the units.

B Write a brief comparison of the sample standard deviations for the weight gains in the two

groups. Do the sample standard deviations indicate that the variability in the distributions is substantially different or not? Explain your reasoning.

+++++ This lesson is part of STATWAY™, A Pathway Through College Statistics, which is a product of a Carnegie Networked Improvement Community that seeks to advance student success. Version 1.0, A Pathway Through Statistics, Statway™ was created by the Charles A. Dana Center at the University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. This version 1.5 and all subsequent versions, result from the continuous improvement efforts of the Carnegie Networked Improvement Community. The network brings together community college faculty and staff, designers, researchers and developers. It is an open-resource research and development community that seeks to harvest the wisdom of its diverse participants in systematic and disciplined inquiries to improve developmental mathematics instruction. For more information on the Statway Networked Improvement Community, please visit carnegiefoundation.org. For the most recent version of instructional materials, visit Statway.org/kernel.

Page 9: Lesson 2.4.1 Quantifying Variability Relative to the Mean 2/Students/PDF/lesson_2...deviation from the mean = (data value – mean) or ... (or at least away from the center of the

STATWAY STUDENT HANDOUT | 9

Lesson 2.4.1 Quantifying Variability Relative to the Mean

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

+++++ STATWAY™ and the Carnegie Foundation logo are trademarks of the Carnegie Foundation for the Advancement of Teaching. A Pathway Through College Statistics may be used as provided in the CC BY license, but neither the Statway trademark nor the Carnegie Foundation logo may be used without the prior written consent of the Carnegie Foundation.