4.1.1: summarizing numerical data sets 1 introduction measures of center and variability can be used...

26
4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two measures of center. The mean is the average value of the data. The median is the middle-most value in a data set. These measures are used to generalize data sets and identify common or expected values. Interquartile range and mean absolute deviation describe variability of the data set. Interquartile range is the difference between the third and first quartiles. The first quartile is the median of the lower half of the data set.

Upload: gwendolyn-allen

Post on 11-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

1

Introduction• Measures of center and variability can be used to

describe a data set. The mean and median are two measures of center. The mean is the average value of the data. The median is the middle-most value in a data set. These measures are used to generalize data sets and identify common or expected values. Interquartile range and mean absolute deviation describe variability of the data set. Interquartile range is the difference between the third and first quartiles. The first quartile is the median of the lower half of the data set.

Page 2: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

2

• Introduction, continued• The third quartile is the median of the upper half of

the data set. The mean absolute deviation is the average absolute value of the difference between each data point and the mean. Measures of spread describe the variance of data values (how spread out they are), and identify the diversity of values in a data set. Measures of spread are used to help explain whether data values are very similar or very different.

Page 3: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

3

Key Concepts• Data sets can be compared using measures of center

and variability.

• Measures of center are used to generalize data sets and identify common or expected values.

• Two measures of center are the mean and median.

Finding the Mean

1. Find the sum of the data values. 2. Divide the sum by the number of data points.

This is the mean.

Page 4: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

4

Key Concepts, continued• The mean is useful when data sets do not contain

values that vary greatly.

• Median is a second measure of center.

Finding the Median

1. First arrange the data from least to greatest. 2. Count the number of data points. If there is an even

number of data points, the median is the average of the two middle-most values. If there is an odd number of data points, the median is the middle-most value.

Page 5: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

5

Key Concepts, continued• The mean and median are both measures that

describe the expected value of a data set.• • Measures of spread describe the range of data values

in a data set.

• Mean absolute deviation and interquartile range describe variability.

Page 6: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

6

• Key Concepts, continued

• The mean absolute deviation takes the average distance of the data points from the mean.

• This summarizes the variability of the data using one number.

Finding the Median Absolute Deviation

1. Find the mean. 2. Calculate the absolute value of the difference

between each data value and the mean.3. Determine the average of the differences found in

step 2. This average is the mean absolute deviation.

Page 7: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

7

• Key Concepts, continuedFinding the Interquartile Range

1. Arrange the data from least to greatest.

2. Count the number of data points in the set.

3. Find the median of the data set. The median divides the data into two halves: the lower half and the upper half.

4. Find the middle-most value of the lower half of the data. The data to the left represents the first quartile, Q 1.

5. Find the middle-most value of the upper half of the data. The data to the right is the third quartile, Q 3.

6. Calculate the difference between the two quartiles, Q 3 – Q 1 . The interquartile range is the difference between the third and first quartiles.

Page 8: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

8

• Key Concepts, continued• The interquartile range finds the distance between the

two data values that represent the middle 50% of the data.

• This summarizes the variability of the data using one number.

Page 9: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets 9

Common Errors/Misconceptions• confusing the terms mean and median, and how to

calculate each measure

• forgetting to order data from least to greatest before calculating the median, quartiles, and interquartile range

• incorrectly finding the absolute value of the difference between each data value and the mean

Page 10: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

10

• Guided Practice• Example 1• Aaron has diabetes, and needs to

monitor his blood sugar level. He measures his blood sugar each day before he eats dinner. Aaron’s results for the past 10 days are in the table below. Blood sugar levels for any person before meals should be between 80 and 120. Are there any striking deviations in the data?

Day Blood sugar level

1 109

2 115

3 89

4 92

5 106

6 101

7 98

8 94

9 107

10 93

Page 11: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

11

• Guided Practice: Example 1, continued

1. To see if there are any deviations in the data, start by finding Aaron’s mean blood sugar level over the past 10 days.

10

9310794981011069289115109

4.10010

1004 Aaron’s mean blood sugar level over

the past 10 days was 100.4.

2. Examine the data for deviations.

The lowest blood sugar level was 89. The highest blood sugar levels was 115. These numbers are not very far apart in terms of the mean. Also, the data is within the suggested range of blood sugar levels. There are no striking deviations in the data.

Page 12: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

12

• Guided Practice• Example 2• A website captures information

about each customer’s order. The total dollar amounts of the last 8 orders are listed in the table to the right. What is the mean absolute deviation of the data?

Order Dollar amount1 21

2 15

3 22

4 26

5 24

6 21

7 17

8 22

Page 13: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

13

• Guided Practice: Example 2, continued

1. To find the mean absolute deviation of the data, start by finding the mean of the data set.

Page 14: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

14

• Guided Practice: Example 2, continued

2. Find the sum of the data values, and divide the sum by the number of data values.

Page 15: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

15

• Guided Practice: Example 2, continued

3. Find the absolute value of the difference between each data value and the mean: |data value – mean|.

• |21 – 21| = 0• |15 – 21| = 6• |22 – 21| = 1• |26 – 21| = 5• |24 – 21| = 3• |21 – 21| = 0• |17 – 21| = 4• |22 – 21| = 1

Page 16: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

16

• Guided Practice: Example 2, continued

4. Find the sum of the absolute values of the differences.

• 0 + 6 + 1 + 5 + 3 + 0 + 4 + 1 = 20

Page 17: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

17

• Guided Practice: Example 3, continued

5. Divide the sum of the absolute values of the differences by the number of data values.

– The mean absolute deviation of the dollar amounts of each order set is 2.5. This says that the average cost difference between the orders and the mean order is $2.50. ✔

Page 18: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

18

• Guided Practice• Example 3• A company keeps track of the age at which

employees retire. It is considered an early retirement if the employee retires before turning 65. The age of the 11 employees who took early retirement this year are listed in the table below. Are there any striking deviations in the data?

Page 19: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

19

• Guided Practice: Example 3, continued

Employee Age at early retirement1 56

2 55

3 60

4 51

5 53

6 58

7 56

8 64

9 59

10 42

11 48

Page 20: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

20

• Guided Practice: Example 3, continued

1. First find the interquartile range.•

Page 21: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

21

• Guided Practice: Example 3, continued

2. Order the data set from least to greatest.

– 42 48 51 53 55 56 56 58 59 60 64

Page 22: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

22

• Guided Practice: Example 3, continued

3. Find the median of the data set.– If there is an odd number of data values, find the

middle-most value. If there is an even number of data values, find the average of the two middle-most values.

– There are 11 data values. The sixth data value is the middle-most value, and therefore is the median. The median of this data set is 56.

– 42 48 51 53 55 56 56 58 59 60 64

median

Page 23: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

23

• Guided Practice: Example 3, continued

4. Find the first quartile.– The first quartile is the median of the lower half of the

data set, or the values less than the median value.

– The first five data values are the lower half of the data set: 42, 48, 51, 53, and 55. The median of the first five data values is the middle-most value of these five values. The first quartile is the third value, 51.

– 42 48 51 53 55 56 56 58 59 60 64

medianQ1

Page 24: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

24

• Guided Practice: Example 3, continued

5. Find the third quartile.– The third quartile is the median of the upper half of

the data set, or the values greater than the median value.

– The last five data values are the upper half of the data set: 56, 58, 59, 60, and 64. The median of the last five data values is the middle-most value of these five values. The third quartile is the third value, 59.

• 42 48 51 53 55 56 56 58 59 60 64

medianQ1 Q3

Page 25: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

25

• Guided Practice: Example 3, continued

6. Find the difference between the third and first quartiles: third quartile – first quartile, or IQR = Q3 – Q1.

• IQR= 59 – 51 = 8

– The interquartile range is 8.

Page 26: 4.1.1: Summarizing Numerical Data Sets 1 Introduction Measures of center and variability can be used to describe a data set. The mean and median are two

4.1.1: Summarizing Numerical Data Sets

26

• Guided Practice: Example 3, continued

7. Look for striking deviations in the data.– Think about the typical retirement age, which is 65.

Also consider the interquartile range, which is 8. Retiring at the age of 42 is young and far away from the median of 56. The age 42 would be considered a striking deviation because it is far away from the other data values.

• This may be considered an outlier. We will discuss outliers later. ✔