overview 3.1 measures of center 3.2 measures of variability 3.4 measures of position 3.6 robust...
TRANSCRIPT
![Page 1: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/1.jpg)
![Page 2: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/2.jpg)
OverviewOverview
3.1 Measures of Center
3.2 Measures of Variability
3.4 Measures of Position
3.6 Robust Measures
![Page 3: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/3.jpg)
3.1 Measures of Center3.1 Measures of CenterObjectives:By the end of this section, I will beable to…
1)Calculate the mean for a given data set.
2)Find the median, and describe why the median is sometimes preferable to the mean.
3)Find the mode of a data set.
4)Describe how skewness affects these measures of center.
![Page 4: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/4.jpg)
The MeanThe Mean
Most well known and widely used measure of center
Simply add up all the numbers and divide by how many numbers you have.
![Page 5: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/5.jpg)
NotationNotation
Statisticians like to use specialized notation.
Sample size - how many observations you have in your sample data set, is always denoted by n
ith data value by xi, where i is simply an index or counter indicating a data point
“add them together” is Σ (capital sigma)
The sample mean is called (pronounced “x-bar”)
X
![Page 6: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/6.jpg)
The sample meanThe sample mean
Written as
In plain English, this just means that, in orderto find the mean x, we
1. Add up all the data values, giving us Σx.
2. And divide by how many observations are in the data set, giving us Σx /n.
xx
n
![Page 7: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/7.jpg)
The Population Mean The Population Mean Mean value of a population is usually unknown
Use x to estimate
Denote the population mean with (mu)
Population size is denoted by N.
The mean is sensitive to the presence of extreme values
x
N
![Page 8: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/8.jpg)
The MedianThe Median The middle data value when the data are put into
ascending order
Half of the data values lie below the median and half lie above
If the sample size n is odd, then the median is a unique middle value.
That is, observation when the data are put in ascending order.
If the sample size n is even, then the median is the mean of the two data values in the middle.
That is, the median is the mean of the two data values that lie on either side of the position.
1
2
thn
1
2
thn
![Page 9: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/9.jpg)
The ModeThe Mode
French speakers will recognize that the term mode in French refers to fashion
The popularity of clothing often depends on just which style is in fashion
In a data set, the value that is most “in fashion” is the value that occurs the most
The mode of a data set is the data value that occurs with the greatest frequency
![Page 10: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/10.jpg)
Example 3.5 - Cost of Example 3.5 - Cost of mathematical journalsmathematical journalsThe rising cost of research journals has been taking an increasing bite out of libraryand research budgets. Table 3.3 contains the annual subscription cost of ten researchjournals in mathematics and statistics for 2006. Find the following.
a. The mean journal subscription costb. The median journal subscription costc. The mode journal subscription cost
![Page 11: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/11.jpg)
Example 3.5 continuedExample 3.5 continued
Table 3.3 Annual subscription cost for ten research journals
![Page 12: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/12.jpg)
Example 3.5 continued Example 3.5 continued
Solution a.The sample mean journal cost is
250 250 402 467 850 1022 1582 1653 1744 3631
1011,851
10$1185.10
Xx
n
![Page 13: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/13.jpg)
Example 3.5 continuedExample 3.5 continued
b.
Since we have n 10 journals, the median is the mean of the two data valuesthat lie on either side of the
The median is the mean of the 5th and 6th
data values, $850 and $1022
median journal cost =
1 10 15.5
2 2
th ththnposition
850 1022$936
2
![Page 14: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/14.jpg)
Example 3.5 continuedExample 3.5 continued
c.
The mode is the data value that occurs with
the greatest frequency.
Only two journals that cost $250 each.
No other cost occurs more than once.
Mode = $250.
Mode is not a very good measure of center for this data set because it is the minimum value.
Illustrates a weakness of using the mode as a measure of center.
![Page 15: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/15.jpg)
How Skewness Affects the How Skewness Affects the Mean and MedianMean and MedianFor a right-skewed distribution, the mean is
larger than the median.
For a left-skewed distribution, the median is larger than the mean.
For a symmetric unimodal distribution, the mean, median, and mode are fairly close to one another.
FIGURE 3.5 How skewness affects the mean and median.
![Page 16: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/16.jpg)
Exploratory Data AnalysisExploratory Data Analysis
Using graphical methods to compare numerical statistics
FIGURE 3.6 Dotplots of the percentage net price change for the Dow Jones Industrial Average, the randomly selected darts portfolio, and the professionally selected portfolio.
![Page 17: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/17.jpg)
SummarySummary
The sample mean represents the sum of the data values in the sample divided by the sample size (n).
The population mean represents the sum of the data values in the population divided by the population size (N).
The mean is sensitive to the presence of extreme values.
x
![Page 18: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/18.jpg)
SummarySummary
The median occupies the middle position when the data are put in ascending order and is not sensitive to extreme values.
The mode is the data value that occurs with the greatest frequency.
Modes can be applied to categorical data as well as numerical data but are not always reliable as measures of center.
![Page 19: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/19.jpg)
SummarySummary
The skewness of a distribution can often tell us something about the relative values of the mean and the median.
![Page 20: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/20.jpg)
3.2 Measures of Variability3.2 Measures of Variability
Objectives:By the end of this section, I will beable to…
1)Understand and calculate the range of a data set.
2)Explain in my own words what a deviation is.
3)Calculate the variance and the standard deviation for a population or a sample.
![Page 21: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/21.jpg)
The RangeThe Range
The difference between the largest value and the smallest value in the data set:
range = largest value – smallest value
Simplest measure of variability
Larger range is an indication of greater variability
![Page 22: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/22.jpg)
Example 3.8 - Range of the Example 3.8 - Range of the volleyball teams’ heightsvolleyball teams’ heights
Calculate the range of player heights for each of the WMU and NCU teams.
FIGURE 3.11 Comparative dotplots of the heights of two volleyball teams.
![Page 23: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/23.jpg)
Example 3.8 continuedExample 3.8 continuedSolution
From Figure 3.11shows WMU heights are more spread out than NCU heights.
Range of WMU team should be larger than the range of the NCU team, reflecting greater variability.
rangeWMU = 75 - 60 = 15 inches
rangeNCU = 72 – 66 = 6 inches
![Page 24: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/24.jpg)
What Is a Deviation?What Is a Deviation?
A deviation for a given data value x is the difference between the data value and the mean of the data set.
For a sample, the deviation equals x - x.
For a population, the deviation equals x - .
Data value is larger than the mean, the deviation will be positive
![Page 25: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/25.jpg)
DeviationDeviation Data value is smaller than the mean, the
deviation will be negative
Data value equals the mean, the deviation will be zero
Deviation can roughly be thought of as the distance between a data value and the mean
The deviation can be negative while distance is always positive
Deviation not useful measure of spread because sum of deviations is always zero.
![Page 26: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/26.jpg)
Population Variance Population Variance
Symbolized by the lowercase Greek letter sigma squared,
Is the mean of the squared deviations in the population and is found by
22 x
N
![Page 27: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/27.jpg)
The Population Standard The Population Standard Deviation Deviation
The square root of the variance
Represents a distance from the mean that is representative for that data set
Not the mean deviation, which is always zero
2x
N
![Page 28: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/28.jpg)
Sample Variance Sample Variance ss22
Based on the idea of finding the sum of the squared deviations x – x)2 and then dividing by the sample size to get the mean squared deviation
Statisticians found a better estimate by dividing by n - 1
22
1
x xs
n
![Page 29: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/29.jpg)
Sample Standard Deviation Sample Standard Deviation ss
The square root of the sample variance s2
Second most important statistic
The value of s may be interpreted as the typical difference between a data value and the sample mean
22
1
x xs s
n
![Page 30: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/30.jpg)
Computational FormulasComputational Formulas
Population variance: Population standard deviation:
22
2
xx
NN
22
2
xx
NN
Sample variance: Sample standard deviation:
22
2
1
xx
nsn
22
2
1
xx
nsn
![Page 31: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/31.jpg)
Example 3.15 - Calculating the Example 3.15 - Calculating the population variance and population variance and population standard deviation population standard deviation using the calculator.using the calculator.
Table 3.13 lists the amount of farmland (in 1000s of acres) in each county in the stateof Connecticut. Since the data set contains all N = 8 counties in Connecticut, it canbe considered a population. Calculate the population variance and population standarddeviation using the calculator.
![Page 32: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/32.jpg)
Example 3.15 continuedExample 3.15 continued
Table 3.13 Farmland in Connecticut
![Page 33: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/33.jpg)
Example 3.15 continuedExample 3.15 continued
The population standard deviation is therefore:
The standard deviation of farmland for all counties in Connecticut is almost 25,100 acres.
2 629.9998438 25.1
![Page 34: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/34.jpg)
SummarySummary
The simplest measure of variability, or measure of spread, is the range.
The range is simply the difference between the maximum and minimum values in a data set
The range has drawbacks because it relies on the two most extreme data values.
A deviation is the difference between a data value and the mean of the data values.
![Page 35: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/35.jpg)
SummarySummary
The variance and standard deviation are measures of spread that utilize all available data values.
The population variance can be thought of as the mean squared deviation.
The standard deviation is the square root of the variance.
Standard deviation is a typical deviation, that is, the typical difference between a data value and the mean.
![Page 36: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/36.jpg)
3.4 Measures of Position3.4 Measures of PositionObjectives:By the end of this section, I will beable to…
1)Find percentiles for both small and large data sets.
![Page 37: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/37.jpg)
PercentilePercentile
Let p be any integer between 0 and 100.
The pth percentile of a data set is the data value at which p percent of the values in the data set are less than or equal to this value.
![Page 38: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/38.jpg)
Example 3.24 - Finding Example 3.24 - Finding percentiles of a small data setpercentiles of a small data setYolanda would like to go to a prestigious graduate school of the arts. She knows thatthis school accepts only those students who score at the 75th percentile or higher ina grueling dance audition. The following data represent the dance audition scores ofYolanda’s group. Yolanda scored 85. Find the 75th percentile of the data set. Will Yolanda be accepted at the prestigious graduate school of the arts?
78 56 89 44 65 94 81 62 75 85 30 68
![Page 39: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/39.jpg)
Example 3.24 continuedExample 3.24 continued
SolutionStep 1: Sort the data into ascending order30 44 56 62 65 68 75 78 81 85 89 94
Step 2:Since we want the 75th percentile, p=75.There are 12 scores, so n=12. Calculate
So, i = 9.75
12 9100 100
pi n
![Page 40: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/40.jpg)
Example 3.24 continuedExample 3.24 continuedStep 3:
Here, since i is an integer, the 75th percentile
is the mean of the data values in positions 9
and 10.
Data value in the ninth position is 81. Data value in the tenth position is 85. Mean of these values is 83. Thus, the 75th
percentile is 83.
Yolanda’s dance score of 85 is therefore
above the 75th percentile. She will be
accepted to the prestigious graduate school.
![Page 41: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/41.jpg)
OutliersOutliers
Extremely large or extremely small data value relative to the rest of the data set
May represent a data entry error, or it may be genuine data
Farther than three standard deviations from the mean
![Page 42: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/42.jpg)
SummarySummary
Measures of position, which tell us the position that a particular data value holds relative to the rest of the data set.
The pth percentile of a dataset is the value at which p percent of the values in the data set are less than or equal to this value.
![Page 43: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/43.jpg)
3.6 Robust Measures3.6 Robust Measures
Objectives:By the end of this section, I will beable to…
1)Find quartiles and the interquartile range.
2)Calculate the five-number summary of a data set.
3)Construct a boxplot for a given data set.
4)Apply robust detection of outliers.
![Page 44: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/44.jpg)
QuartilesQuartiles
Divide the data set into quarters
FIGURE 3.31
![Page 45: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/45.jpg)
QuartilesQuartilesEach part contains 25% of the data.
The first quartile (Q1) is the 25th percentile.
The second quartile (Q2) is the 50th percentile, that is, the median.
The third quartile (Q3) is the 75th percentile.
For small data sets, the division may be into four parts of only approximately equal size.
![Page 46: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/46.jpg)
Example 3.35 - Finding the Example 3.35 - Finding the quartiles for a small data set: quartiles for a small data set: the dance audition scoresthe dance audition scoresIn Example 3.24 (page 126) we examined the
dance scores of 12 students auditioning
for admission into a prestigious graduate
school of the arts. Recall that we found the
75th percentile of the dance audition scores to
be 83. By definition, the 75th percentile
is the third quartile Q3. Therefore, this score of 83
is also the third quartile (Q3) of the audition
scores. Now we will find the first quartile and the
median (second quartile).
![Page 47: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/47.jpg)
Example 3.35 continuedExample 3.35 continued
FIGURE 3.34 The quartiles for the dance audition data.
![Page 48: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/48.jpg)
Interquartile RangeInterquartile Range
Interquartile range (IQR) is a robust measure of variability.
IQR = Q3 - Q1
The interquartile range is interpreted to be the spread of the middle 50% of the data.
![Page 49: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/49.jpg)
Example 3.37 - Finding the Example 3.37 - Finding the interquartile range for the dance interquartile range for the dance audition scoresaudition scores
In Example 3.35, we found that, for the dance audition score data, Q1 = 59 and Q3 = 83.Find the IQR for the dance score data and explain what it means.
![Page 50: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/50.jpg)
Example 3.37 continuedExample 3.37 continued
Solution
We would say that the middle 50%, or middle half, of the dance audition scores ranged over 24 points (see Figure 3.38).
FIGURE 3.38 The IQR for the dance audition data.
![Page 51: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/51.jpg)
Example 3.37 continuedExample 3.37 continued
What would happen if we introduced an outlier into this data set?
Change the lowest score from 30 to 3?
IQR completely unaffected even if we changed the 44 to a 4.
f we changed the 56, then the IQR would be affected, since Q1 would then change.
![Page 52: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/52.jpg)
The Five-Number SummaryThe Five-Number Summary
Consists of the following set of statistics, which together constitute a robust summarization of a data set:
1. Smallest value in the data set (minimum)
2. First quartile, Q1
3. Median, Q2
4. Third quartile, Q3
5. Largest value in the data set (maximum)
![Page 53: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/53.jpg)
Example 3.38 - Find the five-Example 3.38 - Find the five-number summary for the dance number summary for the dance
audition data.audition data.Solution:
Figure 3.39 shows the five-number summary
FIGURE 3.39 The quartiles for the dance audition data.
![Page 54: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/54.jpg)
Example 3.38 continuedExample 3.38 continued
1. Minimum = 30
2. First quartile, Q1 = 59
3. Median Q2 = 71.5
4. Third quartile, Q3 = 83
5. Maximum = 94
![Page 55: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/55.jpg)
Example 3.38 continuedExample 3.38 continued
The five-number summary is often reported as Min = 30, Q1 = 59, Med = 71.5, Q3 = 83, Max = 94.
Which parts of the five-number summary are less robust than others?
Since the minimum and maximum are the most extreme values, these are clearly very sensitive to outliers.
However, Q1, the median, and Q3 are very resistant to the influence of outliers.
![Page 56: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/56.jpg)
The BoxplotThe Boxplot
Convenient graphical display of the five-number summary
Allows the data analyst to evaluate the symmetry or skewness of a data set.
![Page 57: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/57.jpg)
Example 3.41 - Constructing a Example 3.41 - Constructing a Boxplot by handBoxplot by hand
Construct a boxplot for the dance score data.
Use the steps for constructing a boxplot on page 148.
![Page 58: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/58.jpg)
Example 3.41 continuedExample 3.41 continuedSolution From Example 3.38, the five-number summary
for the dance score data is
Min = 30, Q1 = 59, Med = 71.5, Q3 = 83, Max = 94.
From Example 3.37, the interquartile
range for the dance score data is
IQR = Q3 - Q1 = 83 - 59 = 24.
Step 1: Determine the lower and upper fences:
a. Lower fence Q1 - 1.5(IQR)
= 59 - 1.5(24) = 59 - 36 = 23
b. Upper fence Q3 - 1.5(IQR)
= 83 + 1.5(24) = 83 + 36 = 119
![Page 59: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/59.jpg)
Example 3.41 continuedExample 3.41 continued
Step 2: Draw a horizontal number line that encompasses the range of your data,including the fences. Above the number line, draw vertical lines
at Q1 = 59, median = 71.5, and Q3 = 83. Connect the lines for Q1 and Q3 to each
other so as to form a box, as shown in Figure 3.41a below.
![Page 60: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/60.jpg)
Example 3.41 continuedExample 3.41 continued
Step 3:Temporarily indicate the fences (lower fence 23 and upper fence 119) as brackets above the number line. (See Figure 3.41b below.)
![Page 61: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/61.jpg)
Example 3.41 continuedExample 3.41 continuedStep 4:Draw a horizontal line from Q1 = 59 to thesmallest data value greater than the lower fence. The lowest data value is Min = 30.
This is greater than the lower fence = 23.
So draw the line from 59 to 23.
Draw a horizontal line from Q3 = 83 to the largest data value smaller than the upper fence.
![Page 62: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/62.jpg)
Example 3.41 continuedExample 3.41 continued
The largest data value is Max = 94, which is smaller than the upper fence.
So draw the line from 83 to 94. (See Figure 3.41c below.)
![Page 63: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/63.jpg)
Example 3.41 continuedExample 3.41 continuedStep 5:There are no data values lower than the lower fence or greater than the upper fence.
No outliers in this data set.
Remove the temporary brackets. See Figure 3.41d below.
![Page 64: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/64.jpg)
Robust Detection of OutliersRobust Detection of Outliers
Use a five-number summary or a boxplot to detect outliers, as follows:
A data value is an outlier if
a) It is located 1.5(IQR) or more below Q1, or
b) It is located 1.5(IQR) or more above Q3.
![Page 65: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/65.jpg)
Example 3.45 - Robust Example 3.45 - Robust detection of outliers for the detection of outliers for the dance audition datadance audition data
Determine if there are any outliers in thedance score data.
![Page 66: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/66.jpg)
Example 3.45 continuedExample 3.45 continued
Solution
Recall for the dance score data set that
IQR = 24, Q1 = 59, and Q3 = 83.
So we have 1.5(IQR) = 1.5(24) = 36.
Q1 – 1.5(IQR) and Q3 + 1.5(IQR):
The first step is to find the two quantities
Q1 – 1.5(IQR) = Q1 – 36 = 59 – 36 = 23
Q3 + 1.5(IQR) = Q3 + 36 = 83 + 36 = 119
![Page 67: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/67.jpg)
Example 3.45 continuedExample 3.45 continued
No data values less then or equal to 23 or greater than or equal to 119.
No outliers are identified by the robust method.
![Page 68: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/68.jpg)
SummarySummary
Section 3.6 presents robust measures and methods, which are not sensitive to the presence of outliers.
Quartiles divide the data set into approximately equal quarters.
The interquartile range is a measure of variability found by taking the difference between the third and first quartiles.
![Page 69: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/69.jpg)
SummarySummary
The five-number summary is a robust alternative to the usual mean-and-standard- deviation method of summarizing a data set.
It consists of simply reporting the minimum, first quartile, median, third quartile, and maximum of the data set.
![Page 70: Overview 3.1 Measures of Center 3.2 Measures of Variability 3.4 Measures of Position 3.6 Robust Measures](https://reader038.vdocuments.net/reader038/viewer/2022103003/56649eef5503460f94bfee90/html5/thumbnails/70.jpg)
SummarySummaryA boxplot is a graphical representation of the
five-number summary and is useful for investigating skewness and the presence of outliers.
The robust method of detecting outliers is to consider a data value an outlier if it is located 1.5(IQR) or more below Q1, or it is located 1.5(IQR) or more above Q3.