1 chapter 5 understanding and comparing distributions

23
1 Chapter 5 Understanding and Comparing Distributions

Upload: caitlin-moody

Post on 17-Jan-2016

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Chapter 5 Understanding and Comparing Distributions

1

Chapter 5Understanding and

Comparing Distributions

Page 2: 1 Chapter 5 Understanding and Comparing Distributions

2

The Big Picture• We can answer much more interesting questions about

variables when we compare distributions for different groups.

• Below is a histogram of the number of doubles by each player on the 2009 Baltimore Orioles.

Doubles

Fre

quency

605244362820124

7

6

5

4

3

2

1

0

Median = 12Mean = 15.2

Page 3: 1 Chapter 5 Understanding and Comparing Distributions

3

The Big Picture• We can answer much more interesting

questions about variables when we compare distributions for different groups.

• Below is a histogram of the 2009 Payroll for every Major League Baseball Team.

Payroll (Dollars)

Num

ber

of Te

am

s

2000000001600000001200000008000000040000000

9

8

7

6

5

4

3

2

1

0

Median: $80,360,401

IQR: $38,847,256

Page 4: 1 Chapter 5 Understanding and Comparing Distributions

4

The Big Picture

• The distribution is unimodal and skewed to the right.

• The high value may be an outlier

• Median number of doubles is 12 and the IQR is reported to be 16.5

• Can we say more?

Page 5: 1 Chapter 5 Understanding and Comparing Distributions

5

The Five-Number Summary

• The five-number summary of a distribution reports its median, quartiles, and extremes (maximum and minimum).– Example: The five-

number summary for the number of doubles for the Baltimore Orioles is:

Max 56

Q3 20.5

Median 12

Q1 4

Min 0

Page 6: 1 Chapter 5 Understanding and Comparing Distributions

6

Baltimore Orioles Doubles: Making Boxplots

• A boxplot is a graphical display of the five-number summary.

• Boxplots are particularly useful when comparing groups.

Page 7: 1 Chapter 5 Understanding and Comparing Distributions

7

Constructing Boxplots

1. Draw a single vertical axis spanning the range of the data. Draw short horizontal lines at the lower and upper quartiles and at the median. Then connect them with vertical lines to form a box.

Page 8: 1 Chapter 5 Understanding and Comparing Distributions

8

Constructing Boxplots (cont.)

2. Erect “fences” around the main part of the data.

– The upper fence is 1.5 IQRs above the upper quartile.

– The lower fence is 1.5 IQRs below the lower quartile.

– Note: the fences only help with constructing the boxplot and should not appear in the final display.

Page 9: 1 Chapter 5 Understanding and Comparing Distributions

9

Constructing Boxplots (cont.)

3. Use the fences to grow “whiskers.”

– Draw lines from the ends of the box up and down to the most extreme data values found within the fences.

– If a data value falls outside one of the fences, we do not connect it with a whisker.

Page 10: 1 Chapter 5 Understanding and Comparing Distributions

10

Constructing Boxplots (cont.)

4. Add the outliers by displaying any data values beyond the fences with special symbols.– We often use a

different symbol for “far outliers” that are farther than 3 IQRs from the quartiles.

Page 11: 1 Chapter 5 Understanding and Comparing Distributions

Doubles: Making Boxplots• Compare the histogram and boxplot for Baltimore

Orioles Doubles:• How does each display represent the distribution?

605244362820124

Page 12: 1 Chapter 5 Understanding and Comparing Distributions

Tired Yet?

Page 13: 1 Chapter 5 Understanding and Comparing Distributions

Another Example

Open up the dataset in Blackboard called Births1978.MTW

This dataset contains the number of births that took place for each day in 1978.

Make a histogram for the variable number of births.

Page 14: 1 Chapter 5 Understanding and Comparing Distributions

Number of Births

Frequency

108001020096009000840078007200

50

40

30

20

10

0

Page 15: 1 Chapter 5 Understanding and Comparing Distributions

15

Comparing Groups• It is always more interesting to compare groups.• With histograms, note the shapes, centers, and

spreads of the two distributions.• What does this graphical display tell you?

Frequency

108001020096009000840078007200

30

25

20

15

10

5

0

108001020096009000840078007200

Fall/Winter SpringSummer

Histogram of Number of Births

Page 16: 1 Chapter 5 Understanding and Comparing Distributions

16

Comparing Groups

• Boxplots offer an ideal balance of information and simplicity, hiding the details while displaying the overall summary information.

• We often plot them side by side for groups or categories we wish to compare.

Page 17: 1 Chapter 5 Understanding and Comparing Distributions

What do these boxplots tell you?Num

ber

of Birth

s

December

November

October

September

August

July

JuneMa

yApril

March

February

January

11000

10000

9000

8000

7000

Page 18: 1 Chapter 5 Understanding and Comparing Distributions

18

What About Outliers?

• If there are any clear outliers and you are reporting the mean and standard deviation, report them with the outliers present and with the outliers removed. The differences may be quite revealing.

• Note: The median and IQR are not likely to be affected by the outliers.

Page 19: 1 Chapter 5 Understanding and Comparing Distributions

19

Timeplots: Order, Please!• For some data sets, we are interested in how the

data behave over time. In these cases, we construct timeplots of the data.

Date

Num

ber

of Birth

s

1/1/197911/1/19789/1/19787/1/19785/1/19783/1/19781/1/1978

11000

10000

9000

8000

7000

Page 20: 1 Chapter 5 Understanding and Comparing Distributions

20

What Can Go Wrong? (cont.)

• Avoid inconsistent scales, either within the display or when comparing two displays.

• Label clearly so a reader knows what the plot displays.– Good intentions, bad

plot:

Page 21: 1 Chapter 5 Understanding and Comparing Distributions

21

What Can Go Wrong? (cont.)

• Beware of outliers• Be careful when

comparing groups that have very different spreads.– Consider these

side-by-side boxplots of cotinine levels:

Page 22: 1 Chapter 5 Understanding and Comparing Distributions

22

What have we learned?• We’ve learned the value of comparing data

groups and looking for patterns among groups and over time.

• We’ve seen that boxplots are very effective for comparing groups graphically.

• We’ve experienced the value of identifying and investigating outliers.

• We’ve graphed data that has been measured over time against a time axis and looked for long-term trends both by eye and with a data smoother.

Page 23: 1 Chapter 5 Understanding and Comparing Distributions

Class Project• I will be uploading the project sometime this weekend• Start thinking about what you would like to collect data

on– Need Categorical and Quantitative data variables– See me for approval before you get started

• Don’t wait till the last second• Please see me if you have questions• Due on or before 11/06/2012• You will be giving a PowerPoint presentation to the

class

23