lesson 2.4.1 quantifying variability relative to the mean 2/students/pdf/lesson_2...deviation from...
TRANSCRIPT
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
STATWAY™ STUDENT HANDOUT
Lesson 2.4.1 Quantifying Variability Relative to the Mean
STUDENT NAME DATE
INTRODUCTION
Recall the monthly normal temperature data from St. Louis and San Francisco. The data values are contained
in the following table:
Monthly Normal Temperatures (°F) for St. Louis and San Francisco
Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec.
St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9
San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7
Here are side-by-side dotplots of the monthly normal temperatures in the two cities.
8075706560555045403530
St. Louis
San Franciso
Temperatures
Dotplots of Temperatures for St. Louis and San Franciso
The measures of variability that you have considered are the range and the interquartile range (IQR). The
range is the difference between the maximum value and minimum value. The range does not use all of the
specific data values, and it is sensitive to outliers and extreme observations. The IQR is related to the
median, because it is based on the values of the first and third quartiles.
Remember that the sample mean is the average of the data values. It uses all of the specific values. Your
goal is to develop a measure of variability to partner with the sample mean, using all of the specific data
values.
STATWAY STUDENT HANDOUT | 2
Lesson 2.4.1 Quantifying Variability Relative to the Mean
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
Recall the sample means for the two cities:
St. Louis:
San Francisco:
One measure of variability we look at for individual data values is the deviation from the mean. This is calculated by the formula:
deviation from the mean = (data value – mean) or
1 Determine the deviation from the mean for January in St. Louis. What does this number tell you?
2 Here are the deviations from the mean for every month in St. Louis.
Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec.
-26.8 -22.2 -11 0.6 10 19.3 23.7 21.5 14.1 2.3 -9.9 -22.2
Give a method that you might use to determine a number which represents a typical distance from the
mean. Think about this for 2 or 3 minutes, then discuss it with your group. Make sure to come up with
an answer.
The range and IQR both take an outside-in approach to representing variability. In other words, values in the
extremes of the distribution (or at least away from the center of the distribution) are subtracted to provide a
number that represents the variability in a distribution. Rather than work from the outside in, this lesson
adopts an inside-out approach starting with the sample mean.
Each datapoint has a distance from the center, which in this case is the sample mean. This distance is known
as the deviation from the mean or just deviation. Data values with large deviations contribute more to the
variability in the data set. Values with small deviations do not contribute much to the total variability. These
distances from the sample mean to the data values represent the inside-out (from the center to the data
values) approach to quantifying variability.
STATWAY STUDENT HANDOUT | 3
Lesson 2.4.1 Quantifying Variability Relative to the Mean
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
TRY THESE
3 One approach to measuring variability is to combine the information contained in the deviations.
A simple way to combine the deviations is to add them up in a sum. Compute the sum of the
deviations in the table:
St. Louis
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Temp. 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9
Deviation -26.8 -22.2 -11 0.6 10 19.3 23.7 21.5 14.1 2.3 -9.9 -22.2
Notice that the sum of the deviations is very close to 0. In fact, if you kept all of the decimal places in the
calculations and not rounded the values, the sum is exactly 0. This is true for any data set—the sum of the
deviations is 0, because of the way the mean is calculated.
4 What could you do to keep the deviations from cancelling each other out and adding to 0?
5 Here is the same table as before with a new row added. In the new row, the deviations are
squared. Note that the first new entry is (-26.8)2 ≈ 718.2. Fill in the rest of the table by squaring
the remaining deviations.
St. Louis
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Temp. 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9
Deviation -26.8 -22.2 -11 0.6 10 19.3 23.7 21.5 14.1 2.3 -9.9 -22.2
(Deviation)2 718.2
6 Now compute the sum of the squared deviations by totaling the values in the bottom row. Give
the answer below. What are the units of the answer?
STATWAY STUDENT HANDOUT | 4
Lesson 2.4.1 Quantifying Variability Relative to the Mean
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
7 The next table contains the monthly normal temperature values for San Francisco. A Complete the table by computing the deviations and squared deviations. Below the table give
the sum of the squared deviations. San Francisco
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Temp. 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7
Deviation
(Deviation)2
B Compare the sum of the squared deviations for St. Louis to those for San Francisco. Explain
how these values represent how the variability in monthly normal temperatures for the two cities differs.
Although the sum of the squared deviations represents the variability in a distribution, it is not a commonly
used statistic. Normally, the sum of the squared deviations is part of the calculation of the sample variance,
which itself is used to determine the sample standard deviation.
Here is the formula for the sample variance, which is frequently denoted by s2.
Note that n represents the sample size, which in this example is equal to 12 for each city.
For St. Louis, the sample variance is
STATWAY STUDENT HANDOUT | 5
Lesson 2.4.1 Quantifying Variability Relative to the Mean
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
For San Francisco, the sample variance is
8 We are still one step short of where we want to be. The variance is an average of the squared
deviations. What can we do to turn it into an average of the deviations?
9 When we take the square root of the variance it is called the standard deviation and is denoted s.
Calculate the standard deviation for the temperatures in St. Louis and San Francisco.
10 Write a sentence comparing the variability of monthly average temperatures in St. Louis and San
Francisco using the standard deviation for each city.
NEXT STEPS
Let’s think about what standard deviation tells us about data set. Recall that the standard deviation is an
average of the deviations from the mean. In some sense the standard deviation gives the average distance
of the data values from the sample mean.
Consider the standard deviation of St. Louis, s ≈ 18.15°F. An interpretation of this value is that the
temperature of a typical month in St. Louis differs from the mean temperature by about 18 degrees.
Here are some questions to think about in your small groups.
11 Could the standard deviation of a data set ever be negative?
STATWAY STUDENT HANDOUT | 6
Lesson 2.4.1 Quantifying Variability Relative to the Mean
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
12 Could the standard deviation ever be 0?
13 In the Lesson 2.3.1 homework you examined what happened to the IQR when we changed a single
data value. We did this for the temperatures in St. Louis. Here are the temperatures again with
the error in the July temperature.
Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec.
Temperature 29.3 33.9 45.1 56.7 66.1 75.4 97.8 77.6 70.2 58.4 46.2 33.9
Use technology to calculate the sample standard deviation for the monthly temperatures in St. Louis.
How does this compare to the true standard deviation we calculated in class?
Note: Like the sample mean, the sample standard deviation is not resistant to the effects of outliers and
skewing. Very large or small values can have a large impact on the standard deviation.
SUMMARY
We have examined two measures of center (mean and median) and three measures of variability or spread
(range, interquartile range, and standard deviation). In deciding which to use, remember two things.
Mean and standard deviation go together, and median and IQR go together.
Both the mean and standard deviation are strongly impacted by outliers and skewing.
When the data are skewed or contain outliers, we usually use the median and IQR to give a description of
the data. When the data are reasonably symmetric we can use the mean and standard deviation. In
addition, the numbers are never enough; we always look at a graph as well, which can be a dotplot,
histogram, or boxplot.
STATWAY STUDENT HANDOUT | 7
Lesson 2.4.1 Quantifying Variability Relative to the Mean
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
TAKE IT HOME
1 Here are the names and ages for the 30 women on the 15th season (2010) of “The Bachelor.”
Bachelorette Age Bachelorette Age
Alli 24 Lindsay 25
Ashley H 26 Lisa M 24
Ashley S 26 Lisa P 27
Britnee 25 Madison 25
Britt 25 Marissa 26
Chantal 28 Meghan 30
Cristy 30 Melissa 32
Emily 24 Michelle 30
J 26 Raichel 29
Jackie 27 Rebecca 30
Jill 28 Renee 28
Keltie 28 Sarah L 25
Kimberly 27 Sarah P 27
Lacey 27 Shawntel 25
Lauren 26 Stacey 26
Report the sample standard deviation for the ages of the women on the show. Include units.
Explain what the sample standard deviation means in context.
STATWAY STUDENT HANDOUT | 8
Lesson 2.4.1 Quantifying Variability Relative to the Mean
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
2 Recall the example in Lesson 2.2.1 about the weight gains (in grams) for a sample of six normal
adolescent laboratory rats over a one-month period, accompanied by the weight gains for a
sample of six adolescent rats that were given a high daily dose of a stimulant drug.
Here are the weight gains for the two groups:
Control 169 154 179 202 197 175
Stimulant Group 137 158 153 147 168 147
A Use technology to compute the sample standard deviations for the weight gains in each
group. Be sure to include the units.
B Write a brief comparison of the sample standard deviations for the weight gains in the two
groups. Do the sample standard deviations indicate that the variability in the distributions is substantially different or not? Explain your reasoning.
+++++ This lesson is part of STATWAY™, A Pathway Through College Statistics, which is a product of a Carnegie Networked Improvement Community that seeks to advance student success. Version 1.0, A Pathway Through Statistics, Statway™ was created by the Charles A. Dana Center at the University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. This version 1.5 and all subsequent versions, result from the continuous improvement efforts of the Carnegie Networked Improvement Community. The network brings together community college faculty and staff, designers, researchers and developers. It is an open-resource research and development community that seeks to harvest the wisdom of its diverse participants in systematic and disciplined inquiries to improve developmental mathematics instruction. For more information on the Statway Networked Improvement Community, please visit carnegiefoundation.org. For the most recent version of instructional materials, visit Statway.org/kernel.
STATWAY STUDENT HANDOUT | 9
Lesson 2.4.1 Quantifying Variability Relative to the Mean
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
+++++ STATWAY™ and the Carnegie Foundation logo are trademarks of the Carnegie Foundation for the Advancement of Teaching. A Pathway Through College Statistics may be used as provided in the CC BY license, but neither the Statway trademark nor the Carnegie Foundation logo may be used without the prior written consent of the Carnegie Foundation.