chapter 2 · rules of thumb •box plots –measurements between inner and outer fences are suspect...
TRANSCRIPT
![Page 1: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/1.jpg)
Chapter 2
Methods for Describing Sets of Data
![Page 2: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/2.jpg)
Objectives
Describe Data using Graphs
Describe Data using Charts
![Page 3: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/3.jpg)
Describing Qualitative Data
•Qualitative data are nonnumeric in nature
•Best described by using Classes
•2 descriptive measures
class frequency – number of data points in a class
class relative = class frequency
frequency total number of data points in data set
class percentage – class relative frequency x 100
![Page 4: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/4.jpg)
Describing Qualitative Data –
Displaying Descriptive Measures
Summary Table
Class
FrequencyClass percentage – class relative frequency x 100
![Page 5: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/5.jpg)
Describing Qualitative Data –
Qualitative Data Displays
Bar Graph
![Page 6: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/6.jpg)
Describing Qualitative Data –
Qualitative Data Displays
Pie chart
![Page 7: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/7.jpg)
Describing Qualitative Data –
Qualitative Data Displays
Pareto Diagram
![Page 8: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/8.jpg)
Graphical Methods for Describing
Quantitative Data
The Data
Company Percentage Company Percentage Company Percentage Company Percentage
1 13.5 14 9.5 27 8.2 39 6.5
2 8.4 15 8.1 28 6.9 40 7.5
3 10.5 16 13.5 29 7.2 41 7.1
4 9.0 17 9.9 30 8.2 42 13.2
5 9.2 18 6.9 31 9.6 43 7.7
6 9.7 19 7.5 32 7.2 44 5.9
7 6.6 20 11.1 33 8.8 45 5.2
8 10.6 21 8.2 34 11.3 46 5.6
9 10.1 22 8.0 35 8.5 47 11.7
10 7.1 23 7.7 36 9.4 48 6.0
11 8.0 24 7.4 37 10.5 49 7.8
12 7.9 25 6.5 38 6.9 50 6.5
13 6.8 26 9.5
Percentage of Revenues Spent on Research and Development
![Page 9: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/9.jpg)
Graphical Methods for Describing
Quantitative Data
Dot Plot
![Page 10: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/10.jpg)
Graphical Methods for Describing
Quantitative Data
Stem-and-Leaf Display
![Page 11: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/11.jpg)
Graphical Methods for Describing
Quantitative Data
Histogram
![Page 12: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/12.jpg)
Graphical Methods for Describing
Quantitative Data
More on Histograms
Number of Observations in Data Set Number of Classes
Less than 25 5-6
25-50 7-14
More than 50 15-20
![Page 13: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/13.jpg)
Summation Notation
Used to simplify summation instructions
Each observation in a data set is identified
by a subscript
x1, x2, x3, x4, x5, …. xn
Notation used to sum the above numbers
together is
n
n
i
i xxxxxx
4321
1
![Page 14: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/14.jpg)
Summation Notation
Data set of 1, 2, 3, 4
Are these the same? and
4
1
2
i
ix
24
1
i
ix
30169412
4
2
3
2
2
2
1
2
4
1
xxxxx
i
i
100104321222
24
1
4321
xxxxx
i
i
![Page 15: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/15.jpg)
Numerical Measures of Central
Tendency
•Central Tendency – tendency of data to
center about certain numerical values
•3 commonly used measures of Central
Tendency
Mean
Median
Mode
![Page 16: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/16.jpg)
Numerical Measures of Central
Tendency
The Mean
•Arithmetic average of the elements of the data set
•Sample mean denoted by
•Population mean denoted by
•Calculated as
and
x
n
x
x
n
i
i
1
n
x
n
i
i
1
![Page 17: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/17.jpg)
Numerical Measures of Central
Tendency
The Median
•Middle number when observations are
arranged in order
•Median denoted by m
•Identified as the observation if n is
odd, and the mean of the and
observations if n is even
5.02
n
2
n1
2
n
![Page 18: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/18.jpg)
Numerical Measures of Central
Tendency
The Mode
•The most frequently occurring value in the
data set
•Data set can be multi-modal – have more
than one mode
•Data displayed in a histogram will have a
modal class – the class with the largest
frequency
![Page 19: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/19.jpg)
Numerical Measures of Central
Tendency
The Data set 1 3 5 6 8 8 9 11 12
Mean
Median is the or 5th observation, 8
Mode is 8
79
63
9
121198865311
n
x
x
n
i
i
5.02
n
![Page 20: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/20.jpg)
Numerical Measures of Variability
•Variability – the spread of the data across
possible values
•3 commonly used measures of Central
Tendency
Range
Variance
Standard Deviation
![Page 21: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/21.jpg)
Numerical Measures of Variability
The Range
•Largest measurement minus the smallest
measurement
•Loses sensitivity when data sets are large
These 2 distributions
have the same range.
How much does the
range tell you about
the data variability?
![Page 22: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/22.jpg)
Numerical Measures of Variability
The Sample Variance (s2)
•The sum of the squared deviations from the
mean divided by (n-1). Expressed as units
squared
•Why square the deviations? The sum of the
deviations from the mean is zero
1
)(
1
2
2
n
xx
s
n
i
i
![Page 23: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/23.jpg)
Numerical Measures of Variability
The Sample Standard Deviation (s)
•The positive square root of the sample
variance
•Expressed in the original units of
measurement
21
2
1
)(
sn
xx
s
n
i
i
![Page 24: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/24.jpg)
Numerical Measures of Variability
Samples and Populations - Notation
Sample Population
Variance s2
Standard
Deviation s
2
![Page 25: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/25.jpg)
Interpreting the Standard Deviation
How many observations fit within + n s of
the mean?
Chebyshev’s
Rule
Empirical
Rule
orNo useful info Approximately
68%
orAt least 75% Approximately
95%
or At least 8/9 Approximately
99.7%
2s2
3s3
1s1
![Page 26: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/26.jpg)
Interpreting the Standard Deviation
You have purchased compact fluorescent light bulbs for your home.
Average life length is 500 hours, standard deviation is 24, and
frequency distribution for the life length is mound shaped. One of your
bulbs burns out at 450 hours. Would you send the bulb back for a
refund?
Interval Range % of observations
included
% of observations
excluded
476 - 524Approximately
68%
Approximately
32%
452 - 548Approximately
95%
Approximately
5%
428 - 572Approximately
99.7%
Approximately
0.3%
s1
s2
s3
![Page 27: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/27.jpg)
Numerical Measures of Relative
Standing
Descriptive measures of relationship of a
measurement to the rest of the data
Common measures:
• percentile ranking or percentile score
• z-score
![Page 28: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/28.jpg)
Numerical Measures of Relative
Standing
Percentile rankings make use of the pthpercentile
The median is an example of percentiles.
Median is the 50th percentile – 50 % of observations lie above it, and 50% lie below it
For any p, the pth percentile has p% of the measures lying below it, and (100-p)% above it
![Page 29: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/29.jpg)
Numerical Measures of Relative
Standing
z-score – the distance between a
measurement x and the mean, expressed in
standard units
Use of standard units allows comparison
across data sets
xz
s
xxz
![Page 30: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/30.jpg)
Numerical Measures of Relative
Standing
More on z-scores
Z-scores follow the empirical rule for
mounded distributions
![Page 31: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/31.jpg)
Methods for Detecting Outliers
Outlier – an observation that is unusually large or small relative to the data values being described
Causes
• Invalid measurement
• Misclassified measurement
• A rare (chance) event
2 detection methods
• Box Plots
• z-scores
![Page 32: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/32.jpg)
Methods for Detecting Outliers
Box Plots
• based on quartiles, values that divide
the dataset into 4 groups
• Lower Quartile QL – 25th percentile
• Middle Quartile - median
• Upper Quartile QU – 75th percentile
• Interquartile Range (IQR) = QU - QL
![Page 33: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/33.jpg)
Methods for Detecting Outliers
Box Plots
Not on plot – inner and outer fences, which determine potential outliers
QU
(hinge)
QL
(hinge)
Median
Potential Outlier
Whiskers
![Page 34: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/34.jpg)
Methods for Detecting Outliers
Rules of thumb
•Box Plots
–measurements between inner and outer fences are suspect
–measurements beyond outer fences are highly suspect
•Z-scores
–Scores of 3 in mounded distributions (2 in highly skewed distributions) are considered outliers
![Page 35: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/35.jpg)
Graphing Bivariate Relationships
Bivariate relationship – the relationship between
two quantitative variables
Graphically represented with the scattergram
![Page 36: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/36.jpg)
The Time Series Plot
Time Series Data – data produced and monitored
over time
Graphically represented with the time series plot
Time on x axis Order on x axis
![Page 37: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/37.jpg)
Distorting the Truth with Descriptive
Techniques
•Graphical techniques
–Scale manipulation
Same
data,
different
scales
![Page 38: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/38.jpg)
Distorting the Truth with Descriptive
Techniques
•Graphical techniques
–More Scale manipulation
![Page 39: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/39.jpg)
Distorting the Truth with Descriptive
Techniques
•Graphical techniques
–More Scale manipulation
![Page 40: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/40.jpg)
Distorting the Truth with Descriptive
Techniques
•Numerical techniques
–Mismatch of measure of central tendency and
distribution shape
Use of mean overstates average Use of mean understates average
![Page 41: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/41.jpg)
Distorting the Truth with Descriptive
Techniques
•Numerical techniques
–Discussion of central tendency with no information on
variability
Which model would you
purchase if you knew only
the average MPG?
Would knowing the standard
deviation affect your choice?
Why?
![Page 42: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/42.jpg)
Distorting the Truth with Descriptive
Techniques
•Graphical techniques
–Look past the pictures to the data they represent
•Numerical techniques
–Is measure being used most appropriate for underlying
distribution?
–Are you provided with information on central tendency
and variability?
![Page 43: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/43.jpg)
Summary
Graphical methods for Qualitative Data
–Pie chart
–Bar graph
–Pareto diagram
•Graphical methods for Quantitative Data
–Dot plot
–Stem-and-leaf display
–Histogram
![Page 44: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/44.jpg)
Summary
Numerical measures of central tendency
–Mean
–Median
–Mode
•Numerical measures of variation
–Range
–Variance
–Standard Deviation
![Page 45: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/45.jpg)
Summary
Distribution Rules
–Chebyshev’s Rule
–Empirical Rule
•Measures of relative standing
–Percentile scores
–z-scores
•Methods for detecting Outliers
–Box plots
–z-scores
![Page 46: Chapter 2 · Rules of thumb •Box Plots –measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect •Z-scores –Scores of](https://reader033.vdocuments.net/reader033/viewer/2022060907/60a20a66cb8eb756c9688985/html5/thumbnails/46.jpg)
Summary
Method for graphing the relationship
between two quantitative variables
–Scatterplot