bps - 3rd ed. chapter 21 describing distributions with numbers

34
BPS - 3rd Ed . Chapter 2 1 Chapter 2 Describing Distributions with Numbers

Upload: shannon-hill

Post on 22-Dec-2015

226 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 1

Chapter 2

Describing Distributions with Numbers

Page 2: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 2

Numerical Summaries

Center of distribution– mean– median

Spread of distribution– five-point summary (&

interquartile range) – standard deviation (&

variance)

Page 3: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 3

Mean (Arithmetic Average)

Traditional measure of center Notation (“xbar”): Sum the values and divide by the

sample size (n)

xn

x x xn

xn i

i

n

1 1

1 2

1

x

Page 4: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 4

Mean Illustrative Example: “Metabolic Rate”

Data: Metabolic rates, 7 men (cal/day) :

1792 1666 1362 1614 1460 1867 1439

1600 7

200,11

7

1439186714601614136216661792

x

Page 5: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 5

Median (M)

Half of the ordered values are less than or equal to the median value

Half of the ordered values are greater than or equal to the median value

If n is odd, the median is the middle ordered value If n is even, the median is the average of the two

middle ordered values

Page 6: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 6

Median

Example 1 data: 2 4 6 Median (M) = 4

Example 2 data: 2 4 6 8 Median = 5 (average of 4 and 6)

Example 3 data: 6 2 4 Median 2

(order the values: 2 4 6 , so Median = 4)

Page 7: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 7

Location of the Median L(M)

Location of the median: L(M) = (n+1)÷2 ,

where n = sample size.

Example: If 25 data values are recorded, the

Median is located at position (25+1)/2 = 13 in

ordered array.

Page 8: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 8

Median Illustrative Example

L(M) = (7 + 1) / 2 = 4

Ordered array:

1362 1439 1460 1614 1666 1792 1867 median

Value of median = 1614

Data: Metabolic rates, n = 7:1792 1666 1362 1614 1460 1867 1439

Page 9: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 9

Comparing the Mean & Median Mean = median when data are symmetrical Mean median when data skewed or have

outlier (mean ‘pulled’ toward tail) while the median is more resistant

If we switch this:

1362 1439 1460 1614 1666 1792 1867

to this:

1362 1439 1460 1614 1666 1792 9867

the median is still 1614 but the mean goes from 1600 to 2742.9

Page 10: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 10

Question

The average salary at a high tech company is $250K / year

The median salary is $60K. How can this be? Answer: There are some very highly

paid executives, but most of the workers make modest salaries

Page 11: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 11

Spread = Variability

Variability the amount values spread above and below the center

Can be measured in several ways:

– range (rarely used)

– 5-point summary & inter-quartile range

– variance and standard deviation

Page 12: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 12

Range

Based on smallest (minimum) and largest (maximum) values in the data set:

Range = max min

The range is not a reliable measure of

spread (affected by outliers, biased)

Page 13: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 13

Quartiles

Three numbers which divide the ordered data into four equal sized groups.

Q1 has 25% of the data below it.

Q2 has 50% of the data below it. (Median)

Q3 has 75% of the data below it.

Page 14: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 14

Obtaining the Quartiles Order the data. Find the median

– This is Q2 Look at the lower half of the data (those

below the median)– The “median” of this lower half = Q1

Look at the upper half of the data – The “median” of this upper half = Q3

Page 15: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 15

Illustrative example: 10 ages

AGE (years) values, ordered array (n = 10):

05 11 21 24 27 | 28 30 42 50 52

Q1 Q2 Q3

Q1 = 21

Q2 = average of 27 and 28 = 27.5

Q3 = 42

Page 16: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 16

Weight Data: Sorted n = 53 Median: L(M)=(53+1)/2=27 placing it at 165

L(Q1)=(26+1)/2=13.5 placing it between 127 and 128 (127.5)L(Q3) = 13.5 from the top placing it between 185 and 185

100 124 148 170 185 215101 125 150 170 185 220106 127 150 172 186 260106 128 152 175 187110 130 155 175 192110 130 157 180 194119 133 165 180 195120 135 165 180 203120 139 165 180 210123 140 170 185 212

Q1 = 127.5 Q2 = 165 Q3 = 185

Page 17: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 17

10 016611 00912 003457813 0035914 0815 0025716 55517 00025518 00005556719 24520 321 02522 023242526 0

Weight Data:Quartiles

Q2 = 165

Q3 = 185

Q1 = 127.5

Page 18: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 18

Five-Number Summary

minimum = 100 Q1 = 127.5 M = 165 Q3 = 185 maximum = 260

InterquartileRange (IQR)= Q3 Q1

= 57.5

IQR gives spread of middle 50% of the data

Page 19: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 19

Boxplot

Central box spans Q1 and Q3.

A line in the box marks the median M.

Lines extend from the box out to the minimum and maximum.

Page 20: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 20

M

Weight Data: Boxplot

Q1 Q3min max

100 125 150 175 200 225 250 275

Weight

Page 21: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 21

Quartile extrapolation Quartile divides data set into 4 segment: bottom,

bottom middle, top middle, upper With small data sets extrapolate values Illustrative data: 2, 4, 6, 8

2 | 4 | 6 | 8 Q1 Q2 Q3

Q1 = average of 2 and 4, which is 3Q2 = average of 4 and 5, which is 5Q3 = average of 6 and 8, which is 7

Page 22: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 22

Boxplots useful for comparing two groups (text p. 39)

Page 23: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 23

Variances & Standard Deviation

The most common measures of spread

Based on deviations around the mean

Each data value has a deviation, defined as

xxi

Page 24: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 24

Fig 2.3: Metabolic Rate for 7 men, with their mean (*) and two deviations shown

Page 25: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 25

Variance Find the mean Find the deviation of each value Square the deviations Sum the squared deviations: we call

this the sum of squares, or SS Divide the SS by n-1

(gives typical squared deviation from mean)

Page 26: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 26

Variance Formula

sn

x xii

n2

1

12

1

( )

( )

Page 27: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 27

Standard Deviation Square root of the variance

n

ii xx

nss

1

22 )()1(

1

Page 28: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 28

Variance and Standard DeviationIllustrative Example

Data: Metabolic rates, 7 men (cal/day) :

1792 1666 1362 1614 1460 1867 1439

1600 7

200,11

7

1439186714601614136216661792

x

Page 29: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 29

Variance and Standard Deviation Illustrative Example (cont.)

Observations Deviations Squared deviations

1792 17921600 = 192 (192)2 = 36,864

1666 1666 1600 = 66 (66)2 = 4,356

1362 1362 1600 = -238 (-238)2 = 56,644

1614 1614 1600 = 14 (14)2 = 196

1460 1460 1600 = -140 (-140)2 = 19,600

1867 1867 1600 = 267 (267)2 = 71,289

1439 1439 1600 = -161 (-161)2 = 25,921

sum = 0 SS = 214,870

xxi ix 2xxi

Page 30: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 30

Variance and Standard Deviation Illustrative Example (cont.)

67.811,3517

870,2142

s

calories 24.18967.811,35 s

Notes: (1) Use standard deviation s for descriptive purposes(2) Variance & standard deviation calculated by calculator or computer in practice

Page 31: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 31

Summary Statistics

Two main measures of central location– Mean ( ) – Median (M)

Two main measures of spread– Standard deviation (s)– 5-point summary (interquartile range)

x

Page 32: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 32

Choosing Summary Statistics

Use the mean and standard deviation for reasonably symmetric distributions that are free of outliers.

Use the median and IQR (or 5-point summary) when data are skewed or when outliers are present.

Page 33: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 33

Example: Number of Books Read

0 1 2 4 10 300 1 2 4 10 990 1 2 4 120 1 3 5 130 2 3 5 140 2 3 5 140 2 3 5 150 2 4 5 150 2 4 5 201 2 4 6 20

M

L(M)=(52+1)/2=26.5

Page 34: BPS - 3rd Ed. Chapter 21 Describing Distributions with Numbers

BPS - 3rd Ed. Chapter 2 34

Illustrative example: “Books read”

5-point summary: 0, 1, 3, 5.5, 99Note highly asymmetric distribution

“xbar” = 7.06 s = 14.43The mean and standard deviation give false

impression with asymmetric data

0 10 20 30 40 50 60 70 80 90 100 Number of books