graphical displays of information 3.1 – tools for analyzing data learning goal: identify the shape...

29
Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4 , 9, 11 (data in Excel file on wiki),13

Upload: stephany-hicks

Post on 01-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Graphical Displays of Information

3.1 – Tools for Analyzing Data

Learning Goal: Identify the shape of a histogram

MSIP / Home Learning: p. 146 #1, 2, 4 , 9, 11 (data in Excel file on wiki),13

Page 2: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Histograms

Show how data is spread out Best choice for:

continuous data discrete data with a large spread

Data is divided into 5-6 intervals Bin width = width of each interval (same) Different bin widths can produce different

shaped distributions

Page 3: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Histogram Example These

histograms represent the same data

One shows much less of the structure of the data

Too many bins (bin width too small) is also a problem

Co

un

t

5

10

15

20

25

30

SomeData40 60 80 100 120

Data Histogram

Co

un

t

1

2

3

4

5

6

7

8

9

SomeData40 60 80 100 120

Data Histogram

Co

un

t1

2

3

4

5

6

SomeData30 40 50 60 70 80 90 100 110

Data Histogram

Page 4: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Histogram Applet – Old Faithfulhttp://www.stat.sc.edu/~west/javahtml/Histogram.html

Page 5: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Bin Width Calculation

Bin width = (range) ÷ (number of intervals) where range = (max) – (min) Number of intervals is usually 5-6

Bins should not overlap Incorrect:

0-10, 10-20, 20-30, 30-40, etc. Correct:

Discrete: 0 - 9, 10 - 19, 20 - 29, 30 - 39, etc. Concinuous: 0 - 9.99, 10 - 19.99, 20 - 29.99, 30 - 39.99, etc.

Page 6: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Shapes of Distributions

Symmetric Mound Shaped U-Shaped Uniform

Unsymmetrical Left-Skewed Right-Skewed

Page 7: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Mound-shaped distribution Middle interval(s) have the greatest frequency /

tallest bars Bars get shorter as you move away E.g. roll 2 dice, height, memory

Page 8: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

U-shaped distribution

Lowest frequency in the centre, higher towards the outside

E.g. height of a combined grade 1 and 6 class

105.5-

110.5

110.5-

115.5

115.5-

120.5

120.5-

125.5

125.5-

130.5

130.5-

135.5

135.5-

140.5

140.5-

145.5

145.5-

150.5

150.5-

155.5

155.5-

160.5

160.5-

165.6

0

2

4

6

8

10

12

Student Heights

Height (cm)

Frequency

Page 9: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Uniform distribution

All bars are approximately the same height e.g. roll a die 50 times

Page 10: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Symmetric distribution A distribution that is the same on either side of the

centre U-Shaped, Uniform and Mound-shaped

Distributions are symmetric

Page 11: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Skewed distribution (left or right) Highest frequencies at one end Left-skewed has higher bars on the right and

drops off to the left E.g. the years on a handful of quarters (left) E.g. the years of cars on a classic car lot (right)

Page 12: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

MSIP / Home Learning Define in your notes:

Frequency distribution (p. 142-143) Cumulative frequency (p. 148) Relative frequency (p. 148)

Complete p. 146 #1, 2, 4 , 9, 11 (data in Excel file on wiki),13

Page 13: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Warm up - Class marks

What shape is this distribution? Which of the following can you tell from the

graph: mean? median? modal interval?

Left-skewed Modal interval: 72 – 80 Median: 64-72 (70 actual) Mean: 66

1

2

3

4

5

6

7

Mark0 20 40 60 80 100

Collection 1 Histogram

Page 14: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Minds On!

Mr. Lieff recorded the following 20 quiz marks:

60 60 60 60 60 60 70 70 70 70

80 80 80 80 80 100 100 100 100 100

Find the average mark 2 different ways.

Page 15: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Measures of Central Tendency (Mean, Median, Mode)

Chapter 3.2 – Tools for Analyzing Data

Learning Goal: Calculate the mean, median and mode for weighted / grouped data

Due now: p. 146 #1, 2, 4 , 9, 11 (data in Excel file on wiki),13

MSIP / Home Learning: p. 159 #4, 5, 6, 8, 10-13

Page 16: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Sigma Notation the sigma notation is used to compactly

express a mathematical series ex: 1 + 2 + 3 + 4 + … + 15 this can be expressed:

the variable k is called the index of summation.

the number 1 is the lower limit and the number 15 is the upper limit

we would say: “the sum of k for k = 1 to k = 15”

15

1k

k

Page 17: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Example 1:

write in expanded form:

This is the sum of the term 2n+1 as n takes on the values from 4 to 7.

= (2×4 + 1) + (2×5 + 1) + (2×6 + 1) + (2×7 + 1) = 9 + 11 + 13 + 15 = 48 NOTE: any letter can be used for the index of

summation, though a, n, i, j, k & x are the most common

7

4

)12(n

n

Page 18: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Example 2: write the following in sigma notation

8

3

4

3

2

33

3

0 2

3

nn

Page 19: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

The Mean (Average)

n

xx

n

ii

1

Found by dividing the sum of all the data points by the number of data points

Affected greatly by outliers Deviation (3.5)

the distance of a data point from the mean calculated by subtracting the mean from the value i.e. xx

Page 20: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

The Weighted Mean

n

ii

n

iii

w

wxx

1

1

where xi represent the data points, wi represents the weight or the frequency

“The sum of the products of each item and its weight divided by the sum of the weights”

see examples on page 153 and 154 example: 7 students have a mark of 70 and 10 students

have a mark of 80 mean = (70×7 + 80×10) ÷ (7+10) = 75.9

Page 21: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Means with grouped data

for data that is already grouped into class intervals (assuming you do not have the original data), you must use the midpoint of each class to estimate the weighted mean

see the example on page 154-5

Page 22: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Median

the midpoint of the data calculated by placing all the values in order if there is an odd number of values, the median is

the middle number 1 4 6 8 9 median = 6

if there are an even number of values, the median is the mean of the middle two numbers 1 4 6 8 9 12 median = 7

not affected greatly by outliers

Page 23: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Mode

The number that occurs most often There may be no mode, one mode, two modes (bimodal), etc. Which distributions from yesterday have one mode? Mound-shaped, Left/Right-Skewed Two modes? U-Shaped, Mound-shaped (could), Uniform (could) Modes are appropriate for discrete data or non-numerical data

Shoe size, Number of siblings Eye colour, Favourite subject

Page 24: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Distributions and Central Tendancy the relationship between the three measures

changes depending on the spread of the data

symmetric (mound shaped) mean = median = mode

right skewed mean > median > mode

left skewed mean < median < mode

Co

un

t

1

2

3

data0 1 2 3 4 5 6 7

Data Histogram

Co

un

t

1

2

3

4

5

data0 1 2 3 4 5 6 7

Data Histogram

Co

un

t1

2

3

4

5

data0 1 2 3 4 5 6 7

Data Histogram

Page 25: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

What Method is Most Appropriate? Outliers are data points that are quite

different from the other points Outliers affect the mean the greatest The median is least affected by outliers Skewed data is best represented by the

median If symmetric either median or mean If not numeric or if the frequency is the most

critical measure, use the mode

Page 26: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Example 3 a) Find the mean, median and mode

mean = [(1x2) + (2x8) + (3x14) + (4x3)] / 27 = 2.7 median = 3 (27 data points, so #14 falls in bin 3) mode = 3

b) which way is it skewed? Left

Survey responses 1 2 3 4

Frequency 2 8 14 3

Page 27: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

Example 4 Find the mean, median and mode

mean = [(145.5×3) + (155.5×7) + (165.5×4)] ÷ 14

= 156.2 median = 151-160 or 155.5 mode = 151-160 or 155.5

MSIP / Home Learning: p. 159 #4, 5, 6, 8, 10-13

Height 141-150 151-160 161-170

No. of Students 3 7 4

Page 28: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

MSIP / Home Learning

p. 159 #4, 5, 6, 8, 10-13

Page 29: Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,

References

Wikipedia (2004). Online Encyclopedia. Retrieved September 1, 2004 from http://en.wikipedia.org/wiki/Main_Page