qbm117 business statistics descriptive statistics numerical descriptive measures

57
QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Upload: amber-welch

Post on 12-Jan-2016

281 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

QBM117Business Statistics

Descriptive Statistics

Numerical Descriptive Measures

Page 2: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Objectives

• To introduce numerical measures for describing the central location of data

• To introduce numerical measures for describing the variability of data

Page 3: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Numerical Descriptive Methods

• We have looked at tabular and graphical methods for presenting data.

• Although these methods help us to highlight important features of the data, they do not tell the whole story.

• Numerical descriptive measures allow us to be more precise in describing the characteristics of the data.

Page 4: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Numerical Descriptive Methods for Quantitative Data

• Most numerical descriptive measures are obtained through arithmetic operations on the data.

• Arithmetic calculations can only be applied to quantitative data.

• Consequently most of the numerical descriptive measures we will discuss are for quantitative data.

Page 5: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Parameters and Statistics

• Recall the terms introduced in lecture 2 week 1: population, sample, parameter, statistic

• Numerical measures calculated from sample data are called sample statistics.

• Numerical measures calculated from population data are called population parameters.

Page 6: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

• We will look at a number of descriptive statistics and for each we will learn how to calculate both the population parameter and the sample statistic.

• In practice we usually collect data from a sample and calculate sample statistics to use as estimates of population parameters.

Page 7: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Notation

• Statistics are usually represented by Roman letters:

sample mean

sample standard deviation s

• Parameters are usually represented by Greek letters:

population mean population standard deviation

x

Page 8: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Properties of numerical data

• Three major properties that describe quantitative data are

- measures of central tendency

- measures of dispersion

- measures of shape

Page 9: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Measures of Central Tendency

• In most sets of data there is a tendency for the data to group about a central point.

• This phenomenon is referred to as central tendency.

• We will look at three measures of central tendency: mean, median and mode

Page 10: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

The Mean

• The most popular and useful measure of central tendency is the arithmetic mean, widely known as the average.

• The mean is calculated by summing all the observations and dividing by the number of observations.

• It can easily be calculated using the statistics function on your calculator.

Page 11: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

• The mean of a sample of n measurements

is defined as

• The mean of a population of N measurements

is defined as

1 2, ,..., nx x x

1 2 1...

n

in i

xx x x

xn n

1 2, ,..., nx x x

1 2 1...

n

in i

xx x x

N N

Page 12: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

• I have shown you the formulas so that you understand how the mean is calculated.

• However it is expected that you will calculate the mean using the statistics function on your calculator.

• If you are unsure of how to use the statistics functions on your calculator refer to your calculator manual.

• The population mean or the sample mean are calculated using the same button on your calculator.

Page 13: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Example 1The following data are the price-earnings ratios for a set of stocks whose prices are quoted by NASDAQ

Calculate the mean of the data.

4 20 16 28 31

10 23 37 29 15

33 21 18 35 29

4 20 ... 2923.27

15x

Page 14: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

The Median

• The median is the middle value when the data are arranged in order.

• To calculate the median

- Order the data from smallest to largest

- If the number of observations is odd, the median is the middle value.

- If the number of observations in even, the median is the mean of the two middle observations.

Page 15: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Example 1 revisited

The following data are the price-earnings ratios for a set of stocks whose prices are quoted by NASDAQ

Calculate the median.

4 20 16 28 31

10 23 37 29 15

33 21 18 35 29

Page 16: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Order the data.

4 10 15 16 18 20 21 23 28 29 29 31 33 35 37 median

There are 15 observations and so the median will be the middle value.

It will be the 8th value.

Page 17: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Stem and Leaf Display• A useful tool for ordering data is the stem and leaf

display.

• To construct and stem and leaf display separate each observation into

a stem, consisting of all but the last digitand a leaf, the final digit.

• Write the stems in a vertical column (smallest at top) .

• Write each leaf in the row to the right of the stem.

• Redraw, ordering the leaves.

Page 18: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Example 1 revisited

The following data are the price-earnings ratios for a set of stocks whose prices are quoted by NASDAQ

Construct and stem and leaf display and calculate the median.

4 20 16 28 31

10 23 37 29 15

33 21 18 35 29

Page 19: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Ordered 0 4

1 0 5 6 8

2 0 1 3 8 9 9

3 1 3 5 7

0 4

1 6 0 5 8

2 0 8 3 9 1 9

3 1 7 3 5

Page 20: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

The Mode

• The mode is the value that occurs most frequently.

• The mode doesn’t necessarily lie in the middle.

• Its claim to be a measure of central tendency is based on the fact that it indicates the location of greatest concentration of values.

• The mode is a measure of central tendency that can be used for qualitative data.

Page 21: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Example 1 revisited

The following data are the price-earnings ratios for a set of stocks whose prices are quoted by NASDAQ

Calculate the mode.

mode = 29

4 20 16 28 31

10 23 37 29 15

33 21 18 35 29

Page 22: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

• If no data value occurs more than once then there is no mode.

• A data set may have more than one mode.

• If there are two modes then the data are bimodal.

• If there are more than two modes the data are multimodal.

Page 23: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Example 2

A survey of television-viewing habits among university students provided the following data on viewing time in hours per week:

Calculate the mean, median and mode.

14 9 12 4 20 26 17 15

18 15 10 6 16 15 8 5

Page 24: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

mean = 13.125

4 5 6 8 9 10 12 14 15 15 15 16 17 18 20 26

median

median = 14.5

mode = 15

Page 25: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Mean, Median or Mode

• There are several factors to consider when making our choice of measure of central tendency.

• The mean is generally our first selection.

• However, there are circumstances when the median is better.

• The mode is seldom the best measure of central tendency.

Page 26: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

• The mean is a popular measure because it is simple to calculate and interpret, and lends itself to mathematical manipulation.

• However the mean is sensitive to skewness and outliers.

• The mean can be thought of as the balance point of the data.

• If there are a few data points that are far from the bulk of the data, the mean moves towards them in order to maintain balance.

Page 27: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

• The mean is the preferred measure of central tendency.

• However, if the data are skewed or contain outliers then the median is the preferred measure of central tendency.

• If the data are qualitative, the mode must be used.

Page 28: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Relationship between Mean, Median and Mode

• If the data is unimodal and symmetric, the mean, median and mode coincide.

• If the data are unimodal and positively skewed, the mean is greater than the median, which is greater than the mode.

• If the data are unimodal and negatively skewed, the mean is less than the median, which is less than the mode.

Page 29: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Measures of Dispersion

• In addition to knowing the central location of the data values, it is important to know how the values vary about this point.

• We are now going to look at measures of dispersion, also referred to as

- measures of spread

- measures of variability

• We will look at three measures of dispersion:

range, standard deviation and coefficient of variation

Page 30: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

The Range

• The range is the difference between the largest and smallest observations in a data set.

• The range measures the total spread of the data set.

• Although the range is a simple measure of variability, it does not take into account how the data are distributed between the smallest and largest values.

• Hence the range is seldom used as the only measure.

Page 31: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Example 1 revisited

The following data are the price-earnings ratios for a set of stocks whose prices are quoted by NASDAQ

Calculate the range.

range = 37 – 4 = 33

4 20 16 28 31

10 23 37 29 15

33 21 18 35 29

Page 32: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Variance and Standard Deviation

• The variance and the standard deviation are the two most widely accepted measures of dispersion.

• The variance is the square root of the standard deviation.

• Both measures take into account how far each data value is away from the mean.

Page 33: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Population Variance

• The variance of a population of N measurements

having mean is defined as1 2, ,..., nx x x

2 2 22 1 2

2

1

( ) ( ) ... ( )

( )

n

n

ii

x x x

N

x

N

Page 34: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Sample Variance

• The variance of a sample of n measurements

having mean is defined as1 2, ,..., nx x x

2 2 22 1 2

2

1

( ) ( ) ... ( )

1

( )

1

n

n

ii

x x x x x xs

n

x x

n

x

Page 35: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Standard Deviation

• Calculating the variance involves squaring the original measurements and hence the unit attached to the variance is the square of the unit attached to the original measurements.

• Taking the square root of the variance gives as a measure of variability that is in the same units as the data.

• This measure is the standard deviation.

Page 36: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Population Standard Deviation

• The standard deviation of a population of N measurements having mean μ is defined as

1 2, ,..., nx x x

2

1

( )n

ii

x

N

Page 37: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Sample Standard Deviation

• The standard deviation of a sample of n measurements having mean is defined as

1 2, ,..., nx x x

2

1

( )

1

n

ii

x xs

n

x

x

Page 38: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Calculating the Standard Deviation and Variance

• As with the mean, you are expected to calculate the standard deviation and variance using the statistics functions on your calculator.

• You are not to use the formulae, these have been provided to help you understand what the standard deviation and variance are.

• Note that the population standard deviation and sample standard deviation are calculated using different buttons on your calculator.

Page 39: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Important Points about the Standard Deviation

• The standard deviation cannot be negative.

• The standard deviation is zero if, and only if, all of the observations have the same value.

• Like the mean, the standard deviation is not resistant. Strong skewness or a few outliers can greatly increase the standard deviation.

Page 40: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Example 1 revisited

The following data are the price-earnings ratios for a set of stocks whose prices are quoted by NASDAQ

Calculate the standard deviation and the variance.

4 20 16 28 31

10 23 37 29 15

33 21 18 35 29

2

9.57 (2d.p.)

s 91.50 (2d.p.)

s

Page 41: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Coefficient of Variation

• In some situations we may be interested in a measure of variability that indicates how large the standard deviation is in relation to the mean.

• This measure is called the coefficient of variation (CV) and is calculated by dividing the standard deviation of a data set by the mean.

• The CV allows us to compare the variability of two data sets having different units of measurement.

Page 42: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

• A standard deviation of 1mm would be considered very large for the measured thickness of CDs on a production line.

• However a standard deviation of 1mm would be considered small for the height of a telephone pole.

• When the means for data sets differ greatly we do not get an accurate picture of the relative variability in the two data sets by comparing the standard deviations.

Page 43: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Calculating the Coefficient of Variation

• The sample coefficient of variation is calculated by

• The population coefficient of variation is calculated by

scv

x

CV

Page 44: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Example 1 revisited

The following data are the price-earnings ratios for a set of stocks whose prices are quoted by NASDAQ

Calculate the coefficient of variation.

4 20 16 28 31

10 23 37 29 15

33 21 18 35 29

9.570.41 (2d.p.)

23.27cv

Page 45: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Example 2 revisited

A survey of television-viewing habits among university students provided the following data on viewing time in hours per week:

Calculate the range, standard deviation, variance and coefficient of variation.

14 9 12 4 20 26 17 15

18 15 10 6 16 15 8 5

Page 46: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

range = 26 – 6 = 20

standard deviation: s = 5.92 (2d.p.)

variance: s2 = 35.05

coefficient of variation: cv = 0.45 (2d.p.)

Page 47: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Interpreting the Standard Deviation

• The standard deviation, as a measure of average deviation around the mean, helps you understand how the observations are distributed above and below the mean.

• A data set with a large standard deviation has much dispersion with values widely scattered around its mean.

• A data set with a small standard deviation has little dispersion with the values tightly clustered about the mean.

Page 48: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Chebyshe’s Theorem

• More than a century ago, Russian mathematician Pavroty Chebyshev, found that regardless of how a data set is distributed, the proportion of observations that are contained within distances of k standard deviations of the mean is at least 1-(1/k2).

• This is known as Chebyshev’s theorem.

Page 49: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Regardless of the shape of the distribution, Chebyshev’s theorem states:

• At least 75% of the observations must lie within 2 standard deviations of the mean

• At least 89% of the observations must lie within 3 standard deviations of the mean

• At least 94% of the observations must lie within 4 standard deviations of the mean

Page 50: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Example 3.11 from text (pg 86)

The duration (in minutes) of a sample of 30 long-distance telephone calls placed by a firm in Melbourne in a given week are given in Table 3.2 on page 86 of the text.

The 30 telephone-call durations have a mean of 10.26 and a standard deviation of 4.29.

Chebyshev’s theorem states that at least 75% of the call durations lie within 2 standard deviations of the mean.

Page 51: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

When we look at the data we find that all but the largest of the 30 durations fall within this interval.

That is, the interval actually contains 96.7% of the call durations.

2 10.26 2 4.29

1.68

2 10.26 2 4.29

18.84

x s

x s

Page 52: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Empirical Rule

• A more exact rule applies if the distribution of the data is bell-shaped.

• The empirical rule has evolved from empirical studies that have produced samples possessing bell-shaped distributions.

Page 53: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

The empirical rule states that for data with a bell-shaped distribution:

• About 68% of all observations lie within 1 standard deviation of the mean

• About 95% of all observations lie within 2 standard deviations of the mean

• Almost all 94% of the observations lie within 3 standard deviations of the mean

Page 54: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Example 3.12 from text (pg 87)

The data in the sample of telephone-call durations in Table 3.2 have a mean of 10.26, a standard deviation of 4.29, and the durations have an approximately bell-shaped distribution (see Figure 3.5).

According to the empirical rule, approximately 68% of the observations should lie in the interval

( , ) (10.26 4.29,10.26 4.29)

(5.97,14.55)

x s x s

Page 55: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

According to the empirical rule, approximately 68% of the observations should lie in the interval

If we look at the data we see that 21 out of the 30 durations are contained in this interval, i.e. 70%.

This is very close the the empirical rule’s approximation.

( , ) (10.26 4.29,10.26 4.29)

(5.97,14.55)

x s x s

Page 56: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

According to the empirical rule, approximately 95% of the observations should lie in the interval

If we look at the data we see that 29 out of the 30 durations are contained in this interval, i.e. 96.7%.

This is very close the the empirical rule’s approximation.

( 2 , 2 ) (10.26 2 4.29,10.26 2 4.29)

(1.68,18.84)

x s x s

Page 57: QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures

Reading for next lecture

• Chapter 3 Sections 3.5 - 3.6

Exercises

• 3.7• 3.20• 3.25a• 3.31