measures of spread chapter 3.3 – tools for analyzing data mathematics of data management (nelson)...

28
Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Upload: junior-benson

Post on 14-Jan-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Measures of Spread

Chapter 3.3 – Tools for Analyzing Data

Mathematics of Data Management (Nelson)

MDM 4U

Page 2: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Foot size of gongshowhockey.com users What shape are the distributions?

FreqApr1

0

10

20

30

40

50

60

70

80

8 9 10 11 12 13 13+

FreqSep5

050

100150

200250

300350

400450

8 9 10 11 12 13 13+

Page 3: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

What is spread?

spread tells you how widely the data are dispersed

for example, the two histograms have identical mean and median, but the spread is significantly different

Co

un

t

1

23

4

56

7

data2 3 4 5 6 7 8 9

data Histogram

Co

un

t

1

2

3

4

sp2 4 6 8 10

data Histogram

Page 4: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Why worry about spread? spread indicates how close the values cluster

around the middle value less spread means you have greater confidence

that values will fall within a particular range.

Page 5: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Vocabulary spread and dispersion refer to the same

thing range is the difference between the largest

and smallest values a quartile is one of three numerical values

that divide a group of numbers into 4 equal parts

the Interquartile Range (IQR) is the difference between the first and third quartiles

Page 6: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Quartiles Example

26 28 34 36 38 38 40 41 41 44 45 46 51 54 55 range = 55 – 26 = 29 Q2 = 41 Median Q1 = 36 Median of lower half of data Q3 = 46 Median of upper half of data IQR = Q3 – Q1 = 46 – 36 = 10 (contains 50% of data) if a quartile occurs between 2 values, it is

calculated as the average of the two values

Page 7: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

A More Useful Measure of Spread The interquartile range is a somewhat useful

measure of spread Standard deviation is more useful To calculate it we need to find the mean and

the deviation for each data point Mean is easy, as we have done that before Deviation is the difference between a

particular point and the mean

Page 8: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Deviation

The mean of these numbers is 48 The deviation for 24 is 24 - 48 = -24 -24

12 24 36 48 60 72 84

36 The deviation for 84 is 84 - 48 = 36

Page 9: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Standard Deviation deviation is the distance from the piece of

data you are examining to the mean variance is a measure of spread found by

averaging the squares of the deviation calculated for each piece of data

Taking the square root of variance, you get standard deviation

Standard deviation is a very important and useful measure of spread

Page 10: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Standard Deviation σ² (lower case sigma

squared) is used to represent variance

σ is used to represent standard deviation

σ is commonly used to measure the spread of data, with larger values of σ indicating greater spread

we are using a population standard deviation

n

xxi

2

Page 11: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Example of Standard Deviation 26 28 34 36 mean = (26 + 28 + 34 + 36) / 4 = 31 σ² = (26–31)² + (28-31)² + (34-31)² + (36-31)² 4 σ² = 17 σ = √17 = 4.12

Page 12: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Standard Deviation with Grouped Data

grouped mean = (2×2 + 3×6 + 4×6 + 5×2) / 16 = 3.5 deviations:

2: 2 – 3.5 = -1.5 3: 3 – 3.5 = -0.5 4: 4 – 3.5 = 0.5 5: 5 – 3.5 = 1.5

σ² = 2(-1.5)² + 6(-0.5)² + 6(0.5)² + 2(1.5)² 16 σ² = 0.7499 σ = √0.7499 = 0.87

Hours of TV 2 3 4 5

Frequency 2 6 6 2

n

xxf ii

2

Page 13: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

MSIP / Homework read through the examples on pages 164-167 Complete p. 168 #2b, 3b, 4, 6, 7, 10 you are responsible for knowing how to do

simple examples by hand (<10 pieces of data)

however, we will use technology (Fathom) to calculate larger examples

have a look at your calculator and see if you have this feature (Σσn and Σσn-1)

Page 14: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Normal Distribution

Chapter 3.4 – Tools for Analyzing Data

Mathematics of Data Management (Nelson)

MDM 4U

Page 15: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Histograms

Histograms may be skewed...

Right-skewed Left-skewed

Page 16: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Histograms

... or symmetricalC

ou

nt

1

2

3

4

5

a3 4 5 6 7 8 9 10 11

Collection 1 Histogram

Page 17: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Normal? A normal distribution creates a histogram that is

symmetrical and has a bell shape, and is used quite a bit in statistical analyses

Also called a Gaussian Distribution It is symmetrical with equal mean, median and mode

that fall on the line of symmetry of the curve

Page 18: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

A Real Example the heights of 600 randomly chosen Canadian

students from the “Census at School” data set the data approximates a normal distribution

0.005

0.010

0.015

0.020

0.025

0.030

0.035

De

nsit

y

100 120 140 160 180 200 220 240Heightcm

Density = x mean s normalDensity

600 Student Heights Histogram

Page 19: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

The 68-95-99.7% Rule area under curve is 1 (i.e. it represents 100%

of the population surveyed) approx 68% of the data falls within 1 standard

deviation of the mean approx 95% of the data falls within 2 standard

deviations of the mean approx 99.7% of the data falls within 3

standard deviations of the mean http://davidmlane.com/hyperstat/A25329.html

Page 20: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Distribution of Data

34% 34%

13.5% 13.5%

2.35% 2.35%

68%

95%

99.7%

x x + 1σ x + 2σ x + 3σx - 1σx - 2σx - 3σ

),(~ 2xNX

0.15%0.15%

Page 21: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Normal Distribution Notation

The notation above is used to describe the Normal distribution where x is the mean and σ² is the variance (square of the standard deviation)

e.g. X~N (70,82) describes a Normal distribution with mean 70 and standard deviation 8 (our class at midterm?)

),(~ 2xNX

Page 22: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Percentage of data between two values The area under any normal curve is 1 The percent of data that lies between two

values in a normal distribution is equivalent to the area under the normal curve between these values

See examples 2 and 3 on page 175

Page 23: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Why is the Normal distribution so important? Many psychological and educational

variables are distributed approximately normally: reading ability, memory, etc.

Normal distributions are statistically easy to work with All kinds of statistical tests are based on it

Lane (2003)

Page 24: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

An example Suppose the time before burnout for an LED

averages 120 months with a standard deviation of 10 months and is approximately Normally distributed. What is the length of time a user might expect an LED to last?

95% of the data will be within 2 standard deviations of the mean

This will mean that 95% of the bulbs will be between 120 – 2×10 months and 120 + 2×10

So 95% of the bulbs will last 100-140 months

Page 25: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Example continued… Suppose you wanted to know how long

99.7% of the bulbs will last? This is the area covering 3 standard

deviations on either side of the mean This will mean that 99.7% of the bulbs will be

between 120 – 3×10 months and 120 + 3×10 So 99.7% of the bulbs will last 90-150 months This assumes that all the bulbs are produced

to the same standard

Page 26: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Example continued…

34% 34%

13.5% 13.5%

2.35% 2.35%

95%

99.7%

120 140 15010090months monthsmonthsmonths months

Page 27: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

Exercises

try page 176 #1, 3b, 6, 8, 9, 10 http://onlinestatbook.com/

Page 28: Measures of Spread Chapter 3.3 – Tools for Analyzing Data Mathematics of Data Management (Nelson) MDM 4U

References

Lane, D. (2003). What's so important about the normal distribution? Retrieved October 5, 2004 from http://davidmlane.com/hyperstat/normal_distribution.html

Wikipedia (2004). Online Encyclopedia. Retrieved September 1, 2004 from http://en.wikipedia.org/wiki/Main_Page