chapter 5 understanding and comparing distributions

27
. Chapter 5 Understanding and Comparing Distributions

Upload: tobias-rodgers

Post on 23-Dec-2015

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 5 Understanding and Comparing Distributions

.

Chapter 5

Understanding and Comparing Distributions

Page 2: Chapter 5 Understanding and Comparing Distributions

Slide 5- 2

The Big Picture Below is a histogram of the Average Wind Speed at Hopkins Forest

in Western Massachusetts, for every day in 1989.

Page 3: Chapter 5 Understanding and Comparing Distributions

Slide 5- 3

The Big Picture (cont)

The distribution is:

High value may be an outlier Median daily wind speed =1.90 mph IQR is 1.78 mph

Page 4: Chapter 5 Understanding and Comparing Distributions

Slide 5- 4

The Five-Number Summary

The

of a distribution reports its median, quartiles, and minimum and maximum

Example: The five-number summary for the daily wind speed is:

Max 8.67

Q3 2.93

Median 1.90

Q1 1.15

Min 0.20

Page 5: Chapter 5 Understanding and Comparing Distributions

Slide 5- 5

Daily Wind Speed: Making Boxplots

A is a graphical display of the five-number summary.

Boxplots are particularly useful when comparing groups.

Page 6: Chapter 5 Understanding and Comparing Distributions

Slide 5- 6

Constructing BoxplotsFive number summary : 0.20, 1.15, 1.90, 2.93, 9.67

• Draw a single vertical axis spanning the range of the data.

• Draw short horizontal lines at the lower and upper quartiles and at the median.

• Then connect them with vertical lines to form a box.

Page 7: Chapter 5 Understanding and Comparing Distributions

Slide 5- 7

Constructing Boxplots (cont.) Five number summary : 0.20, 1.15, 1.90, 2.93, 9.67

• Sketch “fences” around the main part of the data.

• The upper fence is 1.5 IQRs above the upper quartile.

• The lower fence is 1.5 IQRs below the lower quartile.

• Note: the fences only help with constructing the boxplot and should not appear in the final display.

Page 8: Chapter 5 Understanding and Comparing Distributions

Slide 5- 8

Constructing Boxplots (cont.)

• Use the fences to grow “whiskers.”

• Draw lines from the ends of the box up and down to the minimum and maximum data values found

• If a data value falls outside one of the fences, we do not connect it with a whisker.

Page 9: Chapter 5 Understanding and Comparing Distributions

Slide 5- 9

Constructing Boxplots (cont.)

• Add the outliers by displaying any data values beyond the fences with special symbols.

• We often use a different symbol for “far outliers” that are farther than 3 IQRs from the quartiles.

Page 10: Chapter 5 Understanding and Comparing Distributions

Slide 5- 10

Wind Speed: Making Boxplots (cont.)

Let us compare the histogram and boxplot for daily wind speeds:

Page 11: Chapter 5 Understanding and Comparing Distributions

Slide 5- 11

Comparing Groups It is always more interesting to compare groups. With histograms, note the shapes, centers, and spreads

of the two distributions.

What does this graphical display tell you?

Page 12: Chapter 5 Understanding and Comparing Distributions

Slide 5- 12

Comparing Groups (cont) Boxplots hide the details while displaying the overall summary

information. We often plot them side by side for groups or categories we wish

to compare.

Page 13: Chapter 5 Understanding and Comparing Distributions

Slide 5- 13

What About Outliers?

If there are any clear outliers and you are reporting the mean and standard deviation Report with the outliers present and with the

outliers removed

Note: The median and IQR are not likely to be affected by the outliers.

Page 14: Chapter 5 Understanding and Comparing Distributions

Slide 5- 14

Timeplots: Order, Please! For some data sets, we are interested in how the data

behave over time. In these cases, we construct

of the data.

Page 15: Chapter 5 Understanding and Comparing Distributions

Slide 5- 15

Re-expressing Skewed Data to Improve Symmetry

One way to make a skewed distribution more symmetric is to or the data Apply a simple function (e.g., logarithmic

function).

Page 16: Chapter 5 Understanding and Comparing Distributions

Slide 5- 16

Re-expressing Skewed Data to Improve Symmetry (cont.)

A logarithmic function was applied to each of the observations of the data displayed in the previous slide.

Note the change in from

the raw data (previous slide) to the

data (left).

Page 17: Chapter 5 Understanding and Comparing Distributions

Slide 5- 17

What Can Go Wrong?

Avoid inconsistent scales

Beware of outliers

Be careful when comparing groups with very different spreads

Page 18: Chapter 5 Understanding and Comparing Distributions

Slide 5- 18

What have we learned?

We’ve learned the value of comparing data groups and looking for patterns among groups and over time

We’ve seen that boxplots are very effective for comparing groups graphically

We’ve experienced the value of identifying and investigating outliers

Page 19: Chapter 5 Understanding and Comparing Distributions

Practice Exercise - Chapter 5

A survey conducted in a college intro stats class during Autumn 2003 asked students about the number of credit hours they were taking that quarter. The number of credit hours for a random sample of 16 students is

10 10 12 14 15 15 15 15

17 17 19 20 20 20 20 22

Slide 5- 19

Page 20: Chapter 5 Understanding and Comparing Distributions

Practice Exercise - Chapter 5 (cont)

a. Find the five number summary for the data above

b. Find the IQR for the data

c. From parts (a) and (b), are there any outliers in the data?

d. Create a boxplot of these data.

Slide 5- 20

Page 21: Chapter 5 Understanding and Comparing Distributions

Practice Exercise - Chapter 5 (cont)

10 10 12 14 15 15 15 15

17 17 19 20 20 20 20 22

a. Find the 5 number summary:

Slide 5- 21

Page 22: Chapter 5 Understanding and Comparing Distributions

Practice Exercise - Chapter 5 (cont)

To find quartiles, divide data into 2 even sets

1st: 10 10 12 14 15 15 15 15

2nd: 17 17 19 20 20 20 20 22

To find Q1 we find the median of the first set of numbers above:

→ Q1 =

To find Q3 we find the median of the second set of numbers:

→ Q3 =

Slide 5- 22

Page 23: Chapter 5 Understanding and Comparing Distributions

Practice Exercise - Chapter 5 (cont)

a. Five number summary:

Slide 5- 23

Page 24: Chapter 5 Understanding and Comparing Distributions

Practice Exercise - Chapter 5 (cont)

b. Find the IQR of the data.

IQR =

=

=

Slide 5- 24

Page 25: Chapter 5 Understanding and Comparing Distributions

Practice Exercise - Chapter 5 (cont)

c. From parts (a) and (b), are there any outliers in the data?

To determine if there are outliers we need to calculate the values of the fences.

Lower fence =

=

=

Slide 5- 25

Page 26: Chapter 5 Understanding and Comparing Distributions

Practice Exercise - Chapter 5 (cont)

Upper fence = Q3 + 1.5 x IQR

=

=

Are there any observation outside the fences? None of the observations lie outside the

fences, hence in the data

Slide 5- 26

Page 27: Chapter 5 Understanding and Comparing Distributions

Practice Exercise - Chapter 5 (cont)

d. Create a boxplot

of these data.

Min = 10

Q1 = 14.5

Median = 16

Q3 = 20

Max = 22

Lower fence = 5.75

Upper fence = 28.25

Slide 5- 27

0

5

10

15

20

25

30

35