graphical displays of information chapter 3.1 – tools for analyzing data mathematics of data...
TRANSCRIPT
Graphical Displays of Information
Chapter 3.1 – Tools for Analyzing Data
Mathematics of Data Management (Nelson)
MDM 4U
Histograms
contain continuous data grouped in class intervals, which will display how data is spread over a range
the width of each bar is known as the bin width
different bin widths produce different shaped distributions
bin widths should be equal and there should be at least five (5)
Histogram Example these
histograms represent the same data
however, one shows much less of the structure of the data
too many bins (bin width too small) is also a problem
Co
un
t
5
10
15
20
25
30
SomeData40 60 80 100 120
Data Histogram
Co
un
t
1
2
3
4
5
6
7
8
9
SomeData40 60 80 100 120
Data Histogram
Co
un
t1
2
3
4
5
6
SomeData30 40 50 60 70 80 90 100 110
Data Histogram
Histogram Applet – Old Faithfulhttp://www.isixsigma.com/offsite.asp?A=Fr&Url
=http://www.stat.sc.edu/~west/javahtml/Histogram.html
Bin Width Calculation
the bin width is calculated by dividing the range = max – min by the number of intervals you desire (5-6)
the bins should not overlap wrong: 0-10, 10-20, 20-30, 30-40
Discrete correct: 0-10, 11-20, 21-30, 31-40
Continuous correct: 0-9.99, 10-19.99, 20-29.99, 30-39.99
Mound-shaped distribution The middle interval(s) have the greatest
frequency (i.e. the tallest bar) The bars get smaller as you move out to the
edges.
U-shaped distribution
Lowest frequency in the centre, highest towards the outside
E.g. height of a combined grade 1 and 6 class
Uniform distribution
All bars are approximately the same height E.g. roll a die 50 times
Symmetric distribution A distribution that is the same on either side of the
centre U-Shaped, Uniform and Normal Distributions are
symmetric
Skewed distribution (left and right) Highest frequencies at one end Left-skewed drops off to the left E.g. the years on a handful of quarters
Exercises Define in your notes:
Frequency distribution (p. 146) Cumulative frequency (p. 146) Relative frequency (p. 146)
Try page 146 #1,2,3, 11 (use Excel or Fathom),13
Measures of Central Tendency
Chapter 3.2 – Tools for Analyzing Data
Mathematics of Data Management (Nelson)
MDM 4U
Sigma Notation the sigma notation is used to compactly
express a mathematical series ex: 1 + 2 + 3 + 4 + … + 15 this can be expressed:
the variable k is called the index of summation.
the number 1 is the lower limit and the number 15 is the upper limit
we would say: “the sum of k for k = 1 to k = 15
15
1
1514...4321k
k
Examples:
write in expanded form:
= [2(4) + 1] + [2(5) + 1] + [2(6) + 1] + [2(7) + 1] = 9 + 11 + 13 + 15 =48 note that any letter can be used for the index of
summation, though k, a, n, i, j & x are often used
7
4
)12(n
n
Example: write the following in sigma notation
3210 2
3
2
3
2
3
2
38
3
4
3
2
33
3
0 2
3
nn
The Mean
n
x
x
n
ii
1
found by dividing the sum of all the data points by the number of elements of data
Deviation the distance of a data point from the mean calculated by subtracting the mean from the
value
The Weighted Mean
n
ii
n
iii
w
wxx
1
1
where xi represent the data points, wi represents the weight or the frequency
see examples on page 153 and 154 example: 7 students have a mark of 70 and 10
students have a mark of 80 mean = (70 * 7 + 80 * 10) / (7 + 10)
Means with grouped data
for data that is already grouped into class intervals (assuming you do not have the original data), you must use the midpoint of each class to estimate the weighted mean
see the example on page 154-5
Median
the midpoint of the data calculated by placing all the values in order if there are an even number of values, the
median is the mean of the middle two numbers 1 4 6 8 9 12 median = 7
if there is an odd number of values, the median is the middle number 1 4 6 8 9 median = 6
Mode
Simply chosen by finding the number that occurs most often There may be no mode, one mode, two modes (bimodal), etc. Which distributions from yesterday have one mode? Mound-shaped, Left/Right-Skewed Two modes? U-Shaped, some Symmetric Multiple modes? Uniform Modes are appropriate for discrete data or non-numerical data
shoe sizes shoe colors
Distributions and Central Tendancy the relationship between the three measures
changes depending on the spread of the data
symmetric (mound shaped) mean = median = mode
right skewed mean > median > mode
left skewed mean < median < mode
Co
un
t
1
2
3
data0 1 2 3 4 5 6 7
Data Histogram
Co
un
t
1
2
3
4
5
data0 1 2 3 4 5 6 7
Data Histogram
Co
un
t1
2
3
4
5
data0 1 2 3 4 5 6 7
Data Histogram
What Method is Most Appropriate? Outliers are data points that are quite
different from the other points Outliers have the greatest effect on the mean Median is least affected by outliers Skewed data is best represented by the
median If symmetric either median or mean If not numeric or if the frequency is the most
critical, use the mode
Example 1 find the mean, median and mode
mean = [(1x2) + (2x8) + (3x14) + (4x3)] / 27 = 2.7 median = 3 mode = 3
which way is it skewed? Left
Survey responses 1 2 3 4
Frequency 2 8 14 3
Example 2 Find the mean, median and mode
mean = [(145x3) + (155x7) + (165x4)] / 14 = 155.7 median = 155 mode = 151-160
which way is it skewed? Mound-shaped
Height 141-150 151-160 161-170
No. of Students 3 7 4
Exercises
try page 159 #4, 5, 6, 8
Remembrance Day by the Numbers http://www42.statcan.ca/smr08/smr08_064_e
.htm
References
Wikipedia (2004). Online Encyclopedia. Retrieved September 1, 2004 from http://en.wikipedia.org/wiki/Main_Page