chapter 4 displaying and summarizing quantitative data
TRANSCRIPT
![Page 1: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/1.jpg)
Chapter 4
Displaying and Summarizing Quantitative Data
![Page 2: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/2.jpg)
Objectives
• Histogram• Stem-and-leaf
plot• Dotplot• Shape • Center • Spread• Outliers• Mean • Median
• Range• Interquartile
range (IQR)• Percentile• 5-Number
summary• Resistant• Variance• Standard
Deviation
![Page 3: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/3.jpg)
Dealing With a Lot of Numbers…
• Summarizing the data will help us when we look at large sets of quantitative data.
• Without summaries of the data, it’s hard to grasp what the data tell us.
• The best thing to do is to make a picture…
• We can’t use bar charts or pie charts for quantitative data, since those displays are for categorical variables.
![Page 4: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/4.jpg)
Reasons for Constructing Quantitative Frequency Tables
1. Large data sets can be summarized.
2. Can gain some insight into the nature of data.
3. Have a basis for constructing a histogram.
![Page 5: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/5.jpg)
Ways to chart quantitative data
• Histograms and stemplots
These are summary graphs for a single variable. They
are very useful to understand the pattern of variability in
the data.
• Line graphs: time plots
Use when there is a meaningful sequence, like time. The
line connecting the points helps emphasize any change
over time.
• Other graphs to reflect numerical summaries are
Dotplots and Cumulative Frequency Curves (Ogive).
![Page 6: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/6.jpg)
HISTOGRAMQuantitative Data
![Page 7: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/7.jpg)
Histogram
• To make a histogram we first need to organize the data using a quantitative frequency table.
• Two types of quantitative data1. Discrete – use ungrouped frequency
table to organize.
2. Continuous – use grouped frequency table to organize.
![Page 8: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/8.jpg)
Quantitative Frequency Tables – Ungrouped
• What is an ungrouped frequency table? An ungrouped frequency table simply lists the data values with the corresponding frequency counts with which each value occurs.
• Commonly used with discrete quantitative data.
![Page 9: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/9.jpg)
Quantitative Frequency Tables – Ungrouped
• Example: The at-rest pulse rate for 16 athletes at a meet were 57, 57, 56, 57, 58, 56, 54, 64, 53, 54, 54, 55, 57, 55, 60, and 58. Summarize the information with an ungrouped frequency distribution.
![Page 10: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/10.jpg)
Quantitative Frequency Tables – Ungrouped
• Example Continued
Note: The (ungrouped) classes are the observed values themselves.
![Page 11: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/11.jpg)
Quantitative Relative Frequency Tables -
Ungrouped
Note: The relative frequency for a class is obtainedby computing f/n.
![Page 12: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/12.jpg)
Quantitative Frequency Tables – Grouped
• What is a grouped frequency table? A grouped frequency table is obtained by constructing classes (or intervals) for the data, and then listing the corresponding number of values (frequency counts) in each interval.
• Commonly used with continuous quantitative data.
![Page 13: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/13.jpg)
Quantitative Frequency Tables – Grouped
• Later, we will encounter a graphical display called the histogram. We will see that grouped frequency tables are used to construct these displays.
![Page 14: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/14.jpg)
Quantitative Frequency Tables – Grouped
• There are several procedures that one can use to construct a grouped frequency tables.
• However, because of the many statistical software packages (MINITAB, SPSS etc.) and graphing calculators (TI-83 etc.) available today, it is not necessary to try to construct such distributions using pencil and paper.
![Page 15: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/15.jpg)
Quantitative Frequency Tables – Grouped
• A frequency table should have a minimum of 5 classes and a maximum of 20 classes.
• For small data sets, one can use between 5 and 10 classes.
• For large data sets, one can use up to 20 classes.
![Page 16: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/16.jpg)
Quantitative Frequency Tables – Grouped
• Example: The weights of 30 female students majoring in Physical Education on a college campus are as follows: 143, 113, 107, 151, 90, 139, 136, 126, 122, 127, 123, 137, 132, 121, 112, 132, 133, 121, 126, 104, 140, 138, 99, 134, 119, 112, 133, 104, 129, and 123. Summarize the data with a frequency distribution using seven classes.
![Page 17: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/17.jpg)
Quantitative Frequency Tables – Grouped Example Continued
• NOTE: We will introduce the histogram here to help us explain a grouped frequency distribution.
![Page 18: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/18.jpg)
Quantitative Frequency Tables – Grouped Example Continued
• What is a histogram? A histogram is a graphical display of a frequency or a relative frequency table that uses classes and vertical (horizontal) bars (rectangles) of various heights to represent the frequencies.
![Page 19: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/19.jpg)
Histogram
• The most common graph used to display one variable quantitative data.
![Page 20: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/20.jpg)
Quantitative Frequency Tables – Grouped Example Continued
• The MINITAB statistical software was used to generate the histogram in the next slide.
• The histogram has seven classes.
• Classes for the weights are along the x-axis and frequencies are along the y-axis.
• The number at the top of each rectangular box, represents the frequency for the class.
![Page 21: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/21.jpg)
Quantitative Frequency Tables – Grouped Example Continued
Histogramwith 7 classes for theweights.
![Page 22: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/22.jpg)
Quantitative Frequency Tables – Grouped Example Continued
• Observations• From the histogram, the
classes (intervals) are 85 – 95, 95 – 105,105 – 115 etc. with corresponding frequencies of 1, 3, 4, etc.
• We will use this information to construct the group frequency distribution.
![Page 23: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/23.jpg)
Quantitative Frequency Tables – Grouped Example Continued
• Observations (continued)
• Observe that the upper class limit of 95 for the class 85 – 95 is listed as the lower class limit for the class 95 – 105.
• Since the value of 95 cannot be included in both classes, we will use the convention that the upper class limit is not included in the class.
![Page 24: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/24.jpg)
Quantitative Frequency Tables – Grouped Example Continued
• Observations (continued)
• That is, the class 85 – 95 should be interpreted as having the values 85 and up to 95 but not including the value of 95.
• Using these observations, the grouped frequency distribution is constructed from the histogram and is given on the next slide.
![Page 25: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/25.jpg)
Quantitative Frequency Tables – Grouped Example Continued
![Page 26: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/26.jpg)
Quantitative Frequency Tables – Grouped Example Continued
• Observations (continued)
• In the grouped frequency distribution, the sum of the relative frequencies did not add up to 1. This is due to rounding to four decimal places.
• The same observation should be noted for the cumulative relative frequency column.
![Page 27: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/27.jpg)
Creating a Histogram
It is an iterative process—try and try again.
What bin size should you use?
• Not too many bins with either 0 or 1 counts
• Not overly summarized that you lose all the information
• Not so detailed that it is no longer summary
Rule of thumb: Start with 5 to10 bins.
Look at the distribution and refine your bins.
(There isn’t a unique or “perfect” solution.)
![Page 28: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/28.jpg)
Not summarized enough
Too summarized
Same data set
![Page 29: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/29.jpg)
Histograms
• Frequency Distributions• Example
Definitions
![Page 30: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/30.jpg)
Lower Class Limits
are the smallest numbers that can actually belong to different classes
![Page 31: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/31.jpg)
Lower Class Limits
are the smallest numbers that can actually belong to different classes
Lower ClassLimits
![Page 32: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/32.jpg)
Upper Class Limits
are the largest numbers that can actually belong to different classes
Upper ClassLimits
![Page 33: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/33.jpg)
Class Boundaries
are the numbers used to separate classes, but without the gaps created by class limits
![Page 34: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/34.jpg)
Class Boundaries
number separating classes
- 0.5
99.5
199.5
299.5
399.5
499.5
![Page 35: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/35.jpg)
Class Boundaries
number separating classes
ClassBoundaries
- 0.5
99.5
199.5
299.5
399.5
499.5
![Page 36: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/36.jpg)
Class Midpoints or Class Mark
midpoints of the classesClass midpoints can be found by adding the lower class limit to the upper class limit and dividing the sum by two.
![Page 37: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/37.jpg)
Class Midpoints
midpoints of the classes
ClassMidpoints
49.5
149.5
249.5
349.5
449.5
![Page 38: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/38.jpg)
Class Width
is the difference between two consecutive lower class limits or two consecutive lower class boundaries
Class Width
100
100
100
100
100
![Page 39: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/39.jpg)
Summary of Terminology• Class - non-overlapping intervals the data is
divided into.• Class Limits –The smallest and largest
observed values in a given class.• Class Boundaries – Fall halfway between the
upper class limit for the smaller class and the lower class limit for larger class. Used to close the gap between classes.
• Class Width – The difference between the class boundaries for a given class.
• Class mark – The midpoint of a class.
![Page 40: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/40.jpg)
Constructing A Frequency Table
1. Decide on the number of classes (should be between 5 and 20) .
2. Calculate (round up).
3. Starting point: Begin by choosing a lower limit of the first class.
4. Using the lower limit of the first class and class width, proceed to list the lower class limits.
5. List the lower class limits in a vertical column and proceed to enter the upper class limits.
6. Go through the data set putting a tally in the appropriate class for each data value.
class width (highest value) – (lowest value)
number of classes
![Page 41: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/41.jpg)
Histogram
Then to complete the Histogram, graph the Frequency Table data.
![Page 42: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/42.jpg)
Frequency Histogram vs Relative Frequency Histogram
A bar graph in which the horizontal scale represents the classes of data values and the vertical scale represents the frequencies.
![Page 43: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/43.jpg)
Frequency Histogram vs Relative Frequency Histogram
Has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies.
![Page 44: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/44.jpg)
Frequency Histogram vs Relative Frequency Histogram
![Page 45: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/45.jpg)
Histograms - Facts• Histograms are useful when the
data values are quantitative.• A histogram gives an estimate
of the shape of the distribution of the population from which the sample was taken.
• If the relative frequencies were plotted along the vertical axis to produce the histogram, the shape will be the same as when the frequencies are used.
![Page 46: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/46.jpg)
Making Histograms on the TI-83/84
Use of Stat Plots on the TI-83/84
Raw Data: 548, 405, 375, 400, 475, 450, 412
375, 364, 492, 482, 384, 490, 492
490, 435, 390, 500, 400, 491, 945
435, 848, 792, 700, 572, 739, 572
![Page 47: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/47.jpg)
Frequency Table Data:
Class Limits Frequency
350 to < 450
450 to < 550
550 to < 650
650 to < 750
750 to < 850
850 to < 950
11
10
2
2
2
1
![Page 48: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/48.jpg)
STEM AND LEAF PLOTQuantitative Data
![Page 49: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/49.jpg)
Stem-and-Leaf Plots• What is a stem-and-leaf plot? A
stem-and-leaf plot is a data plot that uses part of a data value as the stem to form groups or classes and part of the data value as the leaf.
• Most often used for small or medium sized data sets. For larger data sets, histograms do a better job.
• Note: A stem-and-leaf plot has an advantage over a grouped frequency table or hostogram, since a stem-and-leaf plot retains the actual data by showing them in graphic form.
![Page 50: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/50.jpg)
StemplotsHow to make a stemplot:
1) Separate each observation into a stem, consisting of all but the final (rightmost) digit, and a leaf, which is that remaining final digit. Stems may have as many digits as needed. Use only one digit for each leaf—either round or truncate the data values to one decimal place after the stem.
2) Write the stems in a vertical column with the smallest value at the top, and draw a vertical line at the right of this column.
3) Write each leaf in the row to the right of its stem, in increasing order out from the stem.
Original data: 9, 9, 22, 32, 33, 39, 39, 42, 49, 52, 58, 70
STEM LEAVES
Include key – how to read the stemplot.
0|9 = 9
![Page 51: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/51.jpg)
Stem-and-Leaf Plots
• Example: Consider the following values – 96, 98, 107, 110, and 112. Construct a stem-and-leaf plot by using the units digits as the leaves.
![Page 52: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/52.jpg)
Stem-and-Leaf Plot
Stems and leaves for the data values.
Stem-and-leaf plot for the data values.
Stem Leaf
09 6 810 711 0 2
Key: 09|6 = 96
![Page 53: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/53.jpg)
Your Turn: Stem-and-Leaf Plots
• A sample of the number of admissions to a psychiatric ward at a local hospital during the full phases of the moon is as follows: 22, 30, 21, 27, 31, 36, 20, 28, 25, 33, 21, 38, 32, 35, 26, 19, 43, 30, 30, 34, 27, and 41.
• Display the data in a stem-and-leaf plot with the leaves represented by the unit digits.
![Page 54: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/54.jpg)
Stem-and-Leaf Plot
Stem Leaf
1 92 0 1 1 2 5 6 7 7 83 0 0 0 1 2 3 4 5 6 84 1 3
Key: 1|9 = 19
![Page 55: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/55.jpg)
Variations of the StemPlot
• Splitting Stems – (too few stems or classes) Split stems to double the number of stems when all the leaves would otherwise fall on just a few stems.
• Each stem appears twice.• Leaves 0-4 go on the 1st stem and leaves 5-9 go on
the 2nd stem.• Example: data –
120,121,121,123,124,124,125,125,125,126,126,128,129,130,132,
132,133,134,134,134,135,137,138,138,138,139
StemPlot StemPlot (splitting stems)
12 0 1 13445556689 12 0 1 1344
13 0223444578889 12 5556689
13 0223444
13 578889
![Page 56: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/56.jpg)
Stemplots are quick and dirty histograms that can easily be
done by hand, therefore, very convenient for back of the
envelope calculations. However, they are rarely found in
scientific or laymen publications.
Stemplots versus Histograms
![Page 57: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/57.jpg)
Stemplots versus Histograms
• Stem-and-leaf displays show the distribution of a quantitative variable, like histograms do, while preserving the individual values.
• Stem-and-leaf displays contain all the information found in a histogram and, when carefully drawn, satisfy the area principle and show the distribution.
![Page 58: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/58.jpg)
Slide 4 - 58
Stem-and-Leaf Example
• Compare the histogram and stem-and-leaf display for the pulse rates of 24 women at a health clinic. Which graphical display do you prefer?
5 6
6 0 4 4 4
6 8 8 8 8
7 2 2 2 2
7 6 6 6 6
8 0 0 0 0 4 4
8 8
4
4
4 8 2 6 0
4 8 2 6 0
4 8 2 6 0
6 0 8 2 6 0 85 6 6 7 7 8 8
Key: 5|6 = 56
![Page 59: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/59.jpg)
DOTPLOTSQuantitative Data
![Page 60: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/60.jpg)
Dot Plots
• What is a dot plot? A dot plot is a plot that displays a dot for each value in a data set along a number line. If there are multiple occurrences of a specific value, then the dots will be stacked vertically.
![Page 61: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/61.jpg)
Dotplots
• A dotplot is a simple display. It just places a dot along an axis for each case in the data.
• The dotplot to the right shows Kentucky Derby winning times, plotting each race as its own dot.
• You might see a dotplot displayed horizontally or vertically.
![Page 62: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/62.jpg)
Dot Plot Example:
• The following data shows the length of 50 movies in minutes. Construct a dot plot for the data.
• 64, 64, 69, 70, 71, 71, 71, 72, 73, 73, 74, 74, 74, 74, 75, 75, 75, 75, 75, 75, 76, 76, 76, 77, 77, 78, 78, 79, 79, 80, 80, 81, 81, 81, 82, 82, 82, 83, 83, 83, 84, 86, 88, 89, 89, 90, 90, 92, 94, 120
Figure 2-5
![Page 63: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/63.jpg)
Dot Plots – Your Turn
The following frequency distribution shows the number of defectives observed by a quality control officer over a 30 day period. Construct a dot plot for the data.
![Page 64: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/64.jpg)
Dot Plots – Solution
![Page 65: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/65.jpg)
Ogive - Cumulative Frequency Curve
![Page 66: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/66.jpg)
Cumulative Frequency and the Ogive• Histogram displays the distribution of a quantitative variable.
It tells little about the relative standing (percentile, quartile, etc.) of an individual observation.
• For this information, we use a Cumulative Frequency graph, called an Ogive (pronounced O-JIVE).
• The Pth percentile of a distribution is a value such that P% of the data fall at or below it.
![Page 67: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/67.jpg)
Cumulative Frequency
• What is a cumulative frequency for a class? The cumulative frequency for a specific class in a frequency table is the sum of the frequencies for all values at or below the given class.
![Page 68: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/68.jpg)
Cumulative Frequency
![Page 69: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/69.jpg)
Constructing an Ogive
1. Make a frequency table and add a cumulative frequency column.
2. To fill in the cumulative frequency column, add the counts in the frequency column that fall in or below the current class interval.
3. Label and scale the axes and title the graph. Horizontal axis “classes” and vertical axis “cumulative frequency or relative cumulative frequency”.
4. Begin the ogive at zero on the vertical axis and lower boundary of the first class on the horizontal axis. Then graph each additional Upper class boundary vs. cumulative frequency for that class.
![Page 70: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/70.jpg)
Ogive
• A line graph that depicts cumulative frequencies.
• Used to Find Quartiles and Percentiles.
![Page 71: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/71.jpg)
Example: Cumulative Frequency Curve
• The frequencies of the scores of 80 students in a test are given in the following table. Complete the corresponding cumulative frequency table.
• A suitable table is as follows:
![Page 72: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/72.jpg)
Example continued• The information provided by a cumulative frequency table
can be displayed in graphical form by plotting the cumulative frequencies given in the table against the upper class boundaries, and joining these points with a smooth.
• The cumulative frequency curve corresponding to the data is as follows:
![Page 73: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/73.jpg)
Your Turn:• The results obtained by 200 students in a mathematics
test are given in the following table.
Draw a cumulative frequency curve and use it to estimate
a) The median mark
b) The number of students who scored less than 22 marks
c) The pass mark if 120 students passed the test
d) The min. mark required to obtain an A grade if 10% of the students received an A grade.
![Page 74: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/74.jpg)
Solution• The required cumulative frequency curve is as follows:
a) The median mark: median mark is 26
b) The number of students who scored less than 22 marks: approximately 69 students scored less than 22 marks
c) The pass mark if 120 students passed the test: pass mark is 28
d) The min. mark required to obtain an A grade if 10% of the students received an A grade: min. mark required for an A is 38
![Page 75: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/75.jpg)
Percentiles
• Explanation of the term – percentiles: Percentiles are numerical values that divide an ordered data set into 100 groups of values with at most 1% of the data values in each group.
• The kth percentile is the number that falls above k% of the data.
![Page 76: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/76.jpg)
Percentiles
• Explanation of the term – kth
percentile: the kth percentile for an ordered array of numerical data is a numerical value Pk (say) such that k% of the data values are smaller than or equal to Pk, and at most (100 – k)% of the data values are larger than Pk.
• The idea of the kth percentile is illustrated on the next slide.
![Page 77: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/77.jpg)
Percentile Corresponding to a Given Data Value
• The percentile corresponding to a given data value, say x, in a set is obtained by using the following formula.
%100
or at
setdatainvaluesofNumber
xbelowvaluesofNumberPercentile
![Page 78: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/78.jpg)
• Example: The shoe sizes, in whole numbers, for a sample of 12 male students in a statistics class were as follows: 13, 11, 10, 13, 11, 10, 8, 12, 9, 9, 8, and 9.
• What is the percentile rank for a shoe size of 12?
Percentile Corresponding to a Given Data Value
![Page 79: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/79.jpg)
• Solution: First, we need to arrange the values from smallest to largest.
• The ordered array is given below: 8, 8, 9, 9, 9, 10, 10, 11, 11, 12, 13, 13.
• Observe that the number of values at or below the value of 12 is 10.
Percentile Corresponding to a Given Data Value
![Page 80: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/80.jpg)
• Solution (continued): The total number of values in the data set is 12.
• Thus, using the formula, the corresponding percentile is:
Percentile Corresponding to a Given Data Value
The value of 12 corresponds to approximately the 83rd percentile.
The value of 12 corresponds to approximately the 83rd percentile.
![Page 81: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/81.jpg)
• Assume that we want to determine what data value falls at some general percentile Pk.
• The following steps will enable you to find a general percentile Pk for a data set.
• Step 1: Order the data set from smallest to largest.
• Step 2: Compute the position c of the percentile. To compute the value of c, use the following formula:
Procedure for Finding a Data Value for a Given Percentile
![Page 82: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/82.jpg)
Procedure for Finding a Data Value for a Given Percentile
•Step 1: If c is not a whole number, round up to the next whole number.
• Locate this position in the ordered set.
• The value in this location is the required percentile.
•Step 1: If c is not a whole number, round up to the next whole number.
• Locate this position in the ordered set.
• The value in this location is the required percentile.
![Page 83: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/83.jpg)
Procedure for Finding a Data Value for a Given Percentile
•Step 2: If c is a whole number.
• Locate this position in the ordered set.
• The value in this location is the required percentile.
•Step 2: If c is a whole number.
• Locate this position in the ordered set.
• The value in this location is the required percentile.
![Page 84: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/84.jpg)
• Example: The data given below represents the 19 countries with the largest numbers of total Olympic medals – excluding the United States, which had 101 medals – for the 1996 Atlanta games. Find the 65th percentile for the data set.
• 63, 65, 50, 37, 35, 41, 25, 23, 27, 21, 17, 17, 20, 19, 22, 15, 15, 15, 15.
Percentile Corresponding to a Given Data Value
![Page 85: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/85.jpg)
• Solution: First, we need to arrange the data set in order. The ordered set is: .
• 15, 15, 15, 15, 17, 17, 19, 20, 21, 22, 23, 25, 27, 35, 37, 41, 50, 63, 65.
• Next, compute the position of the percentile.
• Here n = 19, k = 65.• Thus, c = (19 65)/100 = 12.35.• We need to round up to a value 13.
Percentile Corresponding to a Given Data Value
![Page 86: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/86.jpg)
• Solution (continued): Thus, the 13th value in the ordered data set will correspond to the 65th percentile.
• That is P65 = 27.• Question: Why does a percentile
measure relative position?
Percentile Corresponding to a Given Data Value
![Page 87: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/87.jpg)
Display of the 65th Percentile along with the data values.
Question: Why does a percentile measure Relative Position?
![Page 88: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/88.jpg)
Question: Why does a percentile measure Relative Position?
• Referring to the diagram, observe that the value of 27 is such that at most 65% of the data values are smaller than 27 and at most 35% of the values are larger than 27.
•This shows that the percentile value of 27 is a measure of location.
•Thus, the percentile gives us an idea of the relative position of a value in an ordered data set.
![Page 89: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/89.jpg)
• Deciles and quartiles are special percentiles.
• Deciles divide an ordered data set into 10 equal parts.
• Quartiles divide the ordered data set into 4 equal parts.
• We usually denote the deciles by D1, D2, D3, … , D9.
• We usually denote the quartiles by Q1, Q2, and Q3.
Special Percentiles – Deciles and Quartiles
![Page 90: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/90.jpg)
Quick Tip:
• There are 9 deciles and 3 quartiles.
• Q1 = first quartile = P25• Q2 = second quartile = P50• Q3 = third quartile = P75• D1 = first decile = P10• D2 = second decile = P20 . . .• D9 = ninth decile = P90
![Page 91: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/91.jpg)
Think Before You Draw, Again
• Remember the “Make a picture” rule? • Now that we have options for data
displays, you need to Think carefully about which type of display to make.
• Before making a stem-and-leaf display, a histogram, or a dotplot, check the• Quantitative Data Condition: The data
are values of a quantitative variable whose units are known.
![Page 92: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/92.jpg)
Shape, Center, and Spread
• When describing a distribution, make sure to always tell about three things: shape, center, and spread…
• Actually you should comment on four things when describing a distribution. The three above and any deviations from the shape.
• These deviations from the shape are called ‘outliers’ and will be discussed later.
![Page 93: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/93.jpg)
What is the Shape of the Distribution?
1. Does the histogram have a single, central hump or several separated humps?
2. Is the histogram symmetric?
3. Do any unusual features stick out?
![Page 94: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/94.jpg)
Humps
1. Does the histogram have a single, central hump or several separated bumps?
• Humps in a histogram are called modes or peaks.
• A histogram with one main peak is dubbed unimodal; histograms with two peaks are bimodal; histograms with three or more peaks are called multimodal.
![Page 95: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/95.jpg)
Humps (cont.)
• A bimodal histogram has two apparent peaks:
![Page 96: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/96.jpg)
Humps (cont.)
• A histogram that doesn’t appear to have any mode and in which all the bars are approximately the same height is called uniform:
![Page 97: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/97.jpg)
Uniform or Rectangular Distribution
• A distribution in which every class has equal frequency. A uniform distribution is symmetrical with the added property that the bars are the same height.
![Page 98: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/98.jpg)
Symmetry
2. Is the histogram symmetric?• If you can fold the histogram along a vertical line
through the middle and have the edges match pretty closely, the histogram is symmetric.
![Page 99: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/99.jpg)
Symmetrical Distribution
• In a symmetrical distribution, the data values are evenly distributed on both sides of the mean.
• When the distribution is unimodal, the mean, the median, and the mode are all equal to one another and are located at the center of the distribution.
![Page 100: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/100.jpg)
Symmetrical Distribution
![Page 101: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/101.jpg)
Symmetry (cont.)• The (usually) thinner ends of a distribution are called
the tails. If one tail stretches out farther than the other, the histogram is said to be skewed to the side of the longer tail.
• In the figure below, the histogram on the left is said to be skewed left, while the histogram on the right is said to be skewed right.
![Page 102: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/102.jpg)
Skewed Right Distribution
• In a skewed right distribution, most of the data values fall to the left of the mean, and the “tail” of the distribution is to the right.
• The mean is to the right of the median and the mode is to the left of the median.
![Page 103: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/103.jpg)
Skewed Right Distribution
Skewed Right
![Page 104: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/104.jpg)
Skewed Left Distribution
• In a skewed left distribution, most of the data values fall to the right of the mean, and the “tail” of the distribution is to the left.
• The mean is to the left of the median and the mode is to the right of the median.
![Page 105: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/105.jpg)
Skewed Left Distribution
Skewed Left
![Page 106: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/106.jpg)
Anything Unusual?
3. Do any unusual features stick out?• Sometimes it’s the unusual features
that tell us something interesting or exciting about the data.
• You should always mention any stragglers, or outliers, that stand off away from the body of the distribution.
• Are there any gaps in the distribution? If so, we might have data from more than one group.
![Page 107: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/107.jpg)
Anything Unusual? (cont.)
• The following histogram has outliers—there are three cities in the leftmost bar:
![Page 108: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/108.jpg)
Deviations from the Overall Pattern
• Outliers – An individual observation that falls outside the overall pattern of the distribution. Extreme Values – either high or low.
• Causes:
1. Data Mistake
2. Special nature of some observations
![Page 109: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/109.jpg)
Alaska Florida
Outliers
An important kind of deviation is an outlier. Outliers are
observations that lie outside the overall pattern of a
distribution. Always look for outliers and try to explain them.
The overall pattern is fairly
symmetrical except for two
states clearly not belonging
to the main trend. Alaska
and Florida have unusual
representation of the
elderly in their population.
A large gap in the
distribution is typically a
sign of an outlier.
![Page 110: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/110.jpg)
Other Common Terms
• Peak – high bar• Valley – between 2 peaks• Gap – no data
![Page 111: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/111.jpg)
Numerical Data Properties
Central Tendency (center)
Variation (spread)
Shape
![Page 112: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/112.jpg)
Examples – Describing DistributionsIt’s often a good idea to think about what the distribution of a data set might look like
before we collect the data. What do you think the distribution of each of the following data sets will look like? Be sure to discuss its shape. Where do you think the center might be? How spread out do you think the values will be?
1. Number of Miles run by Saturday morning joggers at a park.• Roughly symmetric, slightly skewed right. Center around 3 miles. Few over 10
miles.
2. Hours spent by U.S. adults watching football on Thanksgiving Day.• Bimodal. Center between 1 and 2 hours. Many people watch no football, others
watch most of one or more games. Probably only a few values over 5 hours.
3. Amount of winnings of all people playing a particular state’s lottery last week.• Strongly skewed to the right, with almost everyone at $0, a few small prizes, with
the winner an outlier.
4. Ages of the faculty members at your school.• Fairly symmetric, somewhat uniform, perhaps slightly skewed to the right. Center
in the 40’s. Few ages below 25 or above 70.
5. Last digit of phone numbers on your campus.• Uniform, symmetric. Center near 5. Roughly equal counts for each digit 0-9.
![Page 113: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/113.jpg)
Where is the Center of the Distribution?
• If you had to pick a single number to describe all the data what would you pick?
• It’s easy to find the center when a histogram is unimodal and symmetric—it’s right in the middle.
• On the other hand, it’s not so easy to find the center of a skewed histogram or a histogram with more than one mode.
![Page 114: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/114.jpg)
Measures of Central Tendency
• A measure of central tendency for a collection of data values is a number that is meant to convey the idea of centralness for the data set.
• The most commonly used measures of central tendency for sample data are the: mean, median, and mode.
![Page 115: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/115.jpg)
The Mean
• Explanation of the term – mean: The mean of a set of numerical (data) values is the (arithmetic) average for the set of values.
• NOTE: When computing the value of the mean, the data values can be population values or sample values.
• Hence we can compute either the population mean or the sample mean
![Page 116: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/116.jpg)
The Mean
• Explanation of the term – population mean: If the numerical values are from an entire population, then the mean of these values is called the population mean.
• NOTATION: The population mean is usually denoted by the Greek letter µ (read as “mu”).
![Page 117: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/117.jpg)
The Mean
• Explanation of the term – sample mean: If the numerical values are from a sample, then the mean of these values is called the sample mean.
• NOTATION: The sample mean is usually denoted by (read as “x-bar”).
x
![Page 118: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/118.jpg)
The Mean -- Example
• Example: What is the mean of the following 11 sample values?
3 8 6 14 0 -4 0 12 -7 0 -10
![Page 119: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/119.jpg)
The Mean -- Example (Continued)
• Solution:
2 11
)10(0)7(120)4(014683
x
![Page 120: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/120.jpg)
The Mean• Nonresistant – The mean is sensitive to the influence of
extreme values and/or outliers. Skewed distributions pull the mean away from the center towards the longer tail.
• The mean is located at the balancing point of the histogram. For a skewed distribution, is not a good measure of center.
![Page 121: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/121.jpg)
The Mean
• Nonresistant – Example
• Example – Data: {1,2,3,4,5,6,7}• The mean is 4• Add an outlier {1,2,3,4,5,6,7,50}• New median is 9.75 – large affect
![Page 122: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/122.jpg)
Quick Tip:
• When a data set has a large number of values, we sometimes summarize it as a frequency table. The frequencies represent the number of times each value occurs.
• When the mean is calculated from a frequency table it is often an approximation, because the raw data is sometimes not known.
![Page 123: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/123.jpg)
Calculating Means
• TI-83/84 1-Var Stats• Using raw data• Using Frequency table data
![Page 124: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/124.jpg)
Calculating Means on TI-83/84
Raw Data: 548, 405, 375, 400, 475, 450, 412
375, 364, 492, 482, 384, 490, 492
490, 435, 390, 500, 400, 491, 945
435, 848, 792, 700, 572, 739, 572
![Page 125: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/125.jpg)
Calculating Means on TI-83/84
Note: The (ungrouped) classes are the observed values themselves.
![Page 126: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/126.jpg)
Calculating Means on TI-83/84
• Grouped Frequency Table Data:
Class Limits Frequency
350 to < 450450 to < 550 550 to < 650650 to < 750750 to < 850850 to < 950
11102221
![Page 127: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/127.jpg)
The Median• Explanation of the term –
median: The median of a set of numerical (data) values is that numerical value in the middle when the data set is arranged in order.
• NOTE: When computing the value of the median, the data values can be population values or sample values.
• Hence we can compute either the population median or the sample median.
![Page 128: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/128.jpg)
Center of a Distribution -- Median
• The median is the value with exactly half the data values below it and half above it.• It is the middle data
value (once the data values have been ordered) that divides the histogram into two equal areas
• It has the same unitsas the data
![Page 129: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/129.jpg)
Quick Tip:
• When the number of values in the data set is odd, the median will be the middle value in the ordered array.
• When the number of values in the data set is even, the median will be the average of the two middle values in the ordered array.
![Page 130: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/130.jpg)
The Median -- Example
• Example: What is the median for the following sample values?
3 8 6 14 0 -4 2 12 -7 -1 -10
![Page 131: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/131.jpg)
The Median -- Example (Continued)
• Solution: First of all, we need to arrange the data set in order. The ordered set is:
-10 -7 -4 -1 0 2 3 6 8 12 14
6th value
![Page 132: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/132.jpg)
The Median -- Example (Continued)
• Solution (Continued): Since the number of values is odd, the median will be found in the 6th position in the ordered set (To find; data number divided by 2 and round up, 11/2 = 5.5⇒6).
• Thus, the value of the median is 2.
![Page 133: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/133.jpg)
The Median -- Example
• Example: Find the median age for the following eight college students.
23 19 32 25 26 22 24 20
![Page 134: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/134.jpg)
The Median – Example (continued)
• Example: First we have to order the values as shown below.
19 20 22 23 24 25 26 32
![Page 135: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/135.jpg)
The Median – Example (continued)
• Example: Since there is an even number of ages, the median will be the average of the two middle values (To find; data number divided by 2, that number and the next are the two middle numbers, 8/2 = 4⇒4th & 5th are the middle numbers).
• Thus, median = (23 + 24)/2 = 23.5.
![Page 136: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/136.jpg)
The MedianThe median is the midpoint of a distribution—the number such
that half of the observations are smaller and half are larger.
1. Sort observations from smallest to largest.n = number of observations
______________________________
1 1 0.62 2 1.23 3 1.64 4 1.95 5 1.56 6 2.17 7 2.38 8 2.39 9 2.5
10 10 2.811 11 2.912 3.313 3.414 1 3.615 2 3.716 3 3.817 4 3.918 5 4.119 6 4.220 7 4.521 8 4.722 9 4.923 10 5.324 11 5.6
n = 24 n/2 = 12 &13
Median = (3.3+3.4) /2 = 3.35
3. If n is even, the median is the mean of the two center observations
1 1 0.62 2 1.23 3 1.64 4 1.95 5 1.56 6 2.17 7 2.38 8 2.39 9 2.5
10 10 2.811 11 2.912 12 3.313 3.414 1 3.615 2 3.716 3 3.817 4 3.918 5 4.119 6 4.220 7 4.521 8 4.722 9 4.923 10 5.324 11 5.625 12 6.1
n = 25 n/2 = 25/2 = 12.5=13 Median = 3.4
2. If n is odd, the median is observation n/2 (round up) down the list
![Page 137: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/137.jpg)
The Median
• Resistant – The median is said to be resistant, because extreme values and/or outliers have little effect on the median.
• Example – Data: {1,2,3,4,5,6,7}• The median is 4• Add an outlier {1,2,3,4,5,6,7,50}• New median is 4.5 – very little affect
![Page 138: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/138.jpg)
The Mode
• Explanation of the term – mode: The mode of a set of numerical (data) values is the most frequently occurring value in the data set.
![Page 139: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/139.jpg)
Quick Tip:
• If all the elements in the data set have the same frequency of occurrence, then the data set is said to have no mode.
Example of data set with no mode.
![Page 140: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/140.jpg)
Quick Tip:
• If the data set has one value that occurs more frequently than the rest of the values, then the data set is said to be unimodal.
Example ofA UnimodalData set.
Example ofA UnimodalData set.
![Page 141: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/141.jpg)
Quick Tip:
• If two data values in the set are tied for the highest frequency of occurrence, then the data set is said to be bimodal.
Example of a bimodal set of data.
Example of a bimodal set of data.
![Page 142: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/142.jpg)
Summary Measures of Center
![Page 143: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/143.jpg)
How Spread Out is the Distribution?
• Variation matters, and Statistics is about variation.
• Are the values of the distribution tightly clustered around the center or more spread out?
• Always report a measure of spread along with a measure of center when describing a distribution numerically.
![Page 144: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/144.jpg)
Measures of Spread
• A measure of variability for a collection of data values is a number that is meant to convey the idea of spread for the data set.
• The most commonly used measures of variability for sample data are the: range interquartile range variance or standard
deviation
![Page 145: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/145.jpg)
Spread: Home on the Range
• The range of the data is the difference between the maximum and minimum values:
Range = max – min• A disadvantage of the range is that a
single extreme value can make it very large and, thus, not representative of the data overall.
![Page 146: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/146.jpg)
Range
• The range is affected by outliers (large or small values relative to the rest of the data set).
• The range does not utilize all the information in the data set only the largest and smallest values.
• Thus it is not a very useful measure of spread or variation.
![Page 147: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/147.jpg)
Spread: The Interquartile Range
• A better way to describe the spread of a set of data might be to ignore the extremes and concentrate on the middle of the data.
• The interquartile range (IQR) lets us ignore extreme data values and concentrate on the middle of the data.
• To find the IQR, we first need to know what quartiles are…
![Page 148: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/148.jpg)
Spread: The Interquartile Range (cont.)
• Quartiles divide the data into four equal sections. • One quarter of the data lies below the
lower quartile, Q1• One quarter of the data lies above the
upper quartile, Q3.• The quartiles border the middle half of
the data.
• The difference between the quartiles is the interquartile range (IQR), so
IQR = upper quartile – lower quartile
![Page 149: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/149.jpg)
Finding Quartiles1. Order the Data
2. Find the median, this divides the data into a lower and upper half (the median itself is in neither half).
3. Q1 is then the median of the lower half.
4. Q3 is the median of the upper half.
5. Example
Even dataQ1=27, M=39, Q3=50.5
IQR = 50.5 – 27 = 23.5
Odd dataQ1=35, M=46, Q3=54
IQR = 54 – 35 = 19
![Page 150: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/150.jpg)
The Interquartile Range
• The following depicts the idea of the interquartile range.
![Page 151: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/151.jpg)
IQR = Q3 - Q1
![Page 152: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/152.jpg)
Spread: The Interquartile Range (cont.)
• The lower and upper quartiles are the 25th and 75th percentiles of the data, so…
• The IQR contains the middle 50% of the values of the distribution, as shown in figure:
![Page 153: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/153.jpg)
M = median = 3.4
Q1= first quartile = 2.2
Q3= third quartile = 4.35
1 1 0.62 2 1.23 3 1.64 4 1.95 5 1.56 6 2.17 7 2.38 1 2.39 2 2.5
10 3 2.811 4 2.912 5 3.313 3.414 1 3.615 2 3.716 3 3.817 4 3.918 5 4.119 6 4.220 7 4.521 1 4.722 2 4.923 3 5.324 4 5.625 5 6.1
Example IQR
The first quartile, Q1, is the value in
the sample that has 25% of the data
at or below it.
The third quartile, Q3, is the value in
the sample that has 75% of the data
at or below it.
IQR=Q3-Q1
=4.35-2.2 =2.15
![Page 154: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/154.jpg)
Your Turn:
• The following scores for a statistics 10-point quiz were reported. What is the value of the interquartile range?
7 8 9 6 8 0 9 9 9
0 0 7 10 9 8 5 7 9
Solution: IQR = 3
![Page 155: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/155.jpg)
Calculator - IQR
• TI-83 Solution: The following shows the descriptive statistics output.
Interquartile range = Q3 – Q1 = 9 – 6 = 3.Interquartile range = Q3 – Q1 = 9 – 6 = 3.
![Page 156: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/156.jpg)
5-Number Summary
• The 5-number summary of a distribution reports its median, quartiles, and extremes (maximum and minimum)
• The 5-number summary for the recent tsunami earthquake Magnitudes looks like this:
• Obtain 5-number summary from 1-Var Stats
![Page 157: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/157.jpg)
What About Spread? The Standard Deviation
• A more powerful measure of spread than the IQR is the standard deviation, which takes into account how far each data value is from the mean.
• A deviation is the distance that a data value is from the mean. • Since adding all deviations together
would total zero, we square each deviation and find an average of sorts for the deviations.
![Page 158: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/158.jpg)
What About Spread? The Standard Deviation (cont.)
• The variance, notated by s2, is found by summing the squared deviations and (almost) averaging them:
• Used to calculate Standard Deviation.• The variance will play a role later in our
study, but it is problematic as a measure of spread - it is measured in squared units - serious disadvantage!
2
2
1
y ys
n
![Page 159: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/159.jpg)
What About Spread? The Standard Deviation (cont.)
• The standard deviation, s, is just the square root of the variance and is measured in the same units as the original data.
2
1
y ys
n
![Page 160: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/160.jpg)
Procedure for Calculating the Standard Deviation using Formula
1. Compute the mean .
2. Subtract the mean from each individual value to get a list of the deviations from the mean .
3. Square each of the differences to produce the square of the deviations from the mean .
4. Add all of the squares of the deviations from the mean to get .
5. Divide the sum by . [variance]
6. Find the square root of the result.
x
x x
2x x
2x x
2x x 1n
![Page 161: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/161.jpg)
Example:
• Find the standard deviation of the Mulberry Bank customer waiting times. Those times (in minutes) are 1, 3, 14.
![Page 162: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/162.jpg)
Calculating Standard Deviation on the TI-83/84
• Use 1-Var Stats• Sx is the sample standard deviation
• σx is the population standard deviation
![Page 163: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/163.jpg)
Properties of Standard Deviation
• Measures spread about the mean and should only be used to describe the spread of a distribution when the mean is used to describe the center (ie. symmetrical distributions).
• The value of s is positive. It is zero only when all of the data values are the same number. Larger values of s indicate greater amounts of variation.
• Nonresistant, s can increase dramatically due to extreme values or outliers.
• The units of s are the same as the units of the original data. One reason s is preferred to s2.
![Page 164: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/164.jpg)
Thinking About Variation
• Since Statistics is about variation, spread is an important fundamental concept of Statistics.
• Measures of spread help us talk about what we don’t know.
• When the data values are tightly clustered around the center of the distribution, the IQR and standard deviation will be small.
• When the data values are scattered far from the center, the IQR and standard deviation will be large.
![Page 165: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/165.jpg)
Summarizing Symmetric Distributions -- The Mean• When we have symmetric data, there is an
alternative other than the median.• If we want to calculate a number, we can
average the data.• We use the Greek letter sigma to mean
“sum” and write:
The formula says that to find the mean, we add up all the values of the variable and divide by the number of data values, n.
yTotaly
n n
![Page 166: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/166.jpg)
Summarizing Symmetric Distributions -- The Mean (cont.)
• The mean feels like the center because it is the point where the histogram balances:
![Page 167: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/167.jpg)
Mean or Median?
• Because the median considers only the order of values, it is resistant to values that are extraordinarily large or small; it simply notes that they are one of the “big ones” or “small ones” and ignores their distance from center.
• To choose between the mean and median, start by looking at the data. If the histogram is symmetric and there are no outliers, use the mean.
• However, if the histogram is skewed or with outliers, you are better off with the median.
![Page 168: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/168.jpg)
Mean and median for skewed distributions
Mean and median for a symmetric distribution
Left skew Right skew
MeanMedian
Mean Median
MeanMedian
Comparing the mean and the median
•The mean and the median are the same only if the distribution is symmetrical.
•The median is a measure of center that is resistant to skew and outliers. The
mean is not.
![Page 169: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/169.jpg)
The median, on the other hand,
is only slightly pulled to the right
by the outliers (from 3.4 to 3.6).
The mean is pulled to the
right a lot by the outliers
(from 3.4 to 4.2).
P
erc
en
t o
f p
eo
ple
dyi
ng
Mean and Median of a Distribution with Outliers
3.4x
Without the outliers
4.2x
With the outliers
![Page 170: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/170.jpg)
Example
• Observed mean =2.28, median=3, mode=3.1
• What is the shape of the distribution and why?
![Page 171: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/171.jpg)
Example
Solution: Skewed Left
Right-SkewedLeft-Skewed Symmetric
Mean = Median = ModeMean Median Mode Mode Median Mean
![Page 172: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/172.jpg)
Conclusion – Mean or Median?
• Mean – use with symmetrical distributions (no outliers), because it is nonresistant.
• Median – use with skewed distribution or distribution with outliers, because it is resistant.
![Page 173: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/173.jpg)
Tell -- Draw a Picture
• When telling about quantitative variables, start by making a histogram or stem-and-leaf display and discuss the shape of the distribution.
![Page 174: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/174.jpg)
Tell -- Shape, Center, and Spread
• Next, always report the shape of its distribution, along with a center and a spread.• If the shape is skewed, report the
median and IQR.• If the shape is symmetric, report the
mean and standard deviation and possibly the median and IQR as well.
![Page 175: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/175.jpg)
Tell -- What About Unusual Features?
• If there are multiple modes, try to understand why. If you identify a reason for the separate modes, it may be good to split the data into two groups.
• If there are any clear outliers and you are reporting the mean and standard deviation, report them with the outliers present and with the outliers removed. The differences may be quite revealing.• Note: The median and IQR are not
likely to be affected by the outliers.
![Page 176: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/176.jpg)
What Can Go Wrong?
• Don’t make a histogram of a categorical variable—bar charts or pie charts should be used for categorical data.
• Don’t look for shape, center, and spread of a bar chart.
![Page 177: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/177.jpg)
What Can Go Wrong? (cont.)
• Don’t use bars in every display—save them for histograms and bar charts.
• Below is a badly drawn plot and the proper histogram for the number of juvenile bald eagles sighted in a collection of weeks:
![Page 178: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/178.jpg)
What Can Go Wrong? (cont.)
• Choose a bin width appropriate to the data.• Changing the bin width changes the appearance of the
histogram:
![Page 179: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/179.jpg)
What Can Go Wrong? (cont.)
• Don’t forget to do a reality check – don’t let the calculator do the thinking for you.
• Don’t forget to sort the values before finding the median or percentiles.
• Don’t worry about small differences when using different methods.
• Don’t compute numerical summaries of a categorical variable.
• Don’t report too many decimal places.• Don’t round in the middle of a calculation.• Watch out for multiple modes• Beware of outliers• Make a picture … make a picture . . . make a picture !!!
![Page 180: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/180.jpg)
What have we learned?
• We’ve learned how to make a picture for quantitative data to help us see the story the data have to Tell.
• We can display the distribution of quantitative data with a histogram, stem-and-leaf display, or dotplot.
• We’ve learned how to summarize distributions of quantitative variables numerically.• Measures of center for a distribution include the median
and mean.• Measures of spread include the range, IQR, and standard
deviation.• Use the median and IQR when the distribution is skewed.
Use the mean and standard deviation if the distribution is symmetric.
![Page 181: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/181.jpg)
What have we learned? (cont.)
• We’ve learned to Think about the type of variable we are summarizing.• All methods of this chapter assume the
data are quantitative.• The Quantitative Data Condition
serves as a check that the data are, in fact, quantitative.
![Page 182: Chapter 4 Displaying and Summarizing Quantitative Data](https://reader038.vdocuments.net/reader038/viewer/2022102604/56649cf35503460f949c12ec/html5/thumbnails/182.jpg)
Assignment
• Exercises pg. 72 – 79: #5 - 18, 30 - 33, 43, 44, 48
• Read Ch-4, pg. 44 - 71