statistics for business and economics: bab 3

59
1 Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western /Thomson Learning

Upload: balo

Post on 02-Feb-2016

225 views

Category:

Documents


0 download

DESCRIPTION

Statistics for Business and Economics: bab 3Materi Statistik untuk Bisnis dan Ekonomi:Anderson, Sweeney, Williams; Bab 3

TRANSCRIPT

Page 1: Statistics for Business and Economics: bab 3

1 1 Slide

Slide

Slides Prepared byJOHN S. LOUCKS

St. Edward’s University

© 2002 South-Western /Thomson Learning

Page 2: Statistics for Business and Economics: bab 3

2 2 Slide

Slide

Chapter 3 Descriptive Statistics: Numerical

Methods

Measures of Location Measures of Variability Measures of Relative Location and Detecting

Outliers Exploratory Data Analysis Measures of Association Between Two

Variables The Weighted Mean and

Working with Grouped Data

xx

%%

Page 3: Statistics for Business and Economics: bab 3

3 3 Slide

Slide

Measures of Location

Mean Median Mode Percentiles Quartiles

Page 4: Statistics for Business and Economics: bab 3

4 4 Slide

Slide

Example: Apartment Rents

Given below is a sample of monthly rent values ($)

for one-bedroom apartments. The data is a sample of 70

apartments in a particular city. The data are presented

in ascending order.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 5: Statistics for Business and Economics: bab 3

5 5 Slide

Slide

Mean

The mean of a data set is the average of all the data values.

If the data are from a sample, the mean is denoted by

.

If the data are from a population, the mean is denoted by m (mu).

xxni

xNi

x

Page 6: Statistics for Business and Economics: bab 3

6 6 Slide

Slide

Example: Apartment Rents

Mean

xxni

34 35670

490 80,

.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 7: Statistics for Business and Economics: bab 3

7 7 Slide

Slide

Median

The median is the measure of location most often reported for annual income and property value data.

A few extremely large incomes or property values can inflate the mean.

Page 8: Statistics for Business and Economics: bab 3

8 8 Slide

Slide

Median

The median of a data set is the value in the middle when the data items are arranged in ascending order.

For an odd number of observations, the median is the middle value.

For an even number of observations, the median is the average of the two middle values.

Page 9: Statistics for Business and Economics: bab 3

9 9 Slide

Slide

Example: Apartment Rents

Median Median = 50th percentile

i = (p/100)n = (50/100)70 = 35.5 Averaging the 35th and

36th data values:Median = (475 + 475)/2 = 475425 430 430 435 435 435 435 435 440 440

440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 10: Statistics for Business and Economics: bab 3

10 10 Slide

Slide

Mode

The mode of a data set is the value that occurs with greatest frequency.

The greatest frequency can occur at two or more different values.

If the data have exactly two modes, the data are bimodal.

If the data have more than two modes, the data are multimodal.

Page 11: Statistics for Business and Economics: bab 3

11 11 Slide

Slide

Example: Apartment Rents

Mode 450 occurred most frequently (7

times) Mode = 450425 430 430 435 435 435 435 435 440 440

440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 12: Statistics for Business and Economics: bab 3

12 12 Slide

Slide

Percentiles

A percentile provides information about how the data are spread over the interval from the smallest value to the largest value.

Admission test scores for colleges and universities are frequently reported in terms of percentiles.

Page 13: Statistics for Business and Economics: bab 3

13 13 Slide

Slide

The pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more.• Arrange the data in ascending order.• Compute index i, the position of the pth

percentile.

i = (p/100)n

• If i is not an integer, round up. The p th percentile is the value in the i th position.

• If i is an integer, the p th percentile is the average of the values in positions i and i +1.

Percentiles

Page 14: Statistics for Business and Economics: bab 3

14 14 Slide

Slide

Example: Apartment Rents

90th Percentilei = (p/100)n = (90/100)70 = 63

Averaging the 63rd and 64th data values: 90th Percentile = (580 + 590)/2 =

585425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 15: Statistics for Business and Economics: bab 3

15 15 Slide

Slide

Quartiles

Quartiles are specific percentiles First Quartile = 25th Percentile Second Quartile = 50th Percentile = Median Third Quartile = 75th Percentile

Page 16: Statistics for Business and Economics: bab 3

16 16 Slide

Slide

Example: Apartment Rents

Third Quartile Third quartile = 75th percentile i = (p/100)n = (75/100)70 = 52.5 =

53 Third quartile = 525

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 17: Statistics for Business and Economics: bab 3

17 17 Slide

Slide

Measures of Variability

It is often desirable to consider measures of variability (dispersion), as well as measures of location.

For example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each.

Page 18: Statistics for Business and Economics: bab 3

18 18 Slide

Slide

Measures of Variability

Range Interquartile Range Variance Standard Deviation Coefficient of Variation

Page 19: Statistics for Business and Economics: bab 3

19 19 Slide

Slide

Range

The range of a data set is the difference between the largest and smallest data values.

It is the simplest measure of variability. It is very sensitive to the smallest and largest

data values.

Page 20: Statistics for Business and Economics: bab 3

20 20 Slide

Slide

Example: Apartment Rents

Range Range = largest value - smallest

value Range = 615 - 425 = 190

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 21: Statistics for Business and Economics: bab 3

21 21 Slide

Slide

Interquartile Range

The interquartile range of a data set is the difference between the third quartile and the first quartile.

It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data

values.

Page 22: Statistics for Business and Economics: bab 3

22 22 Slide

Slide

Example: Apartment Rents

Interquartile Range 3rd Quartile (Q3) = 525 1st Quartile (Q1) = 445

Interquartile Range = Q3 - Q1 = 525 - 445 = 80

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 23: Statistics for Business and Economics: bab 3

23 23 Slide

Slide

Variance

The variance is a measure of variability that utilizes all the data.

It is based on the difference between the value of each observation (xi) and the mean (x for a sample, m for a population).

Page 24: Statistics for Business and Economics: bab 3

24 24 Slide

Slide

Variance

The variance is the average of the squared differences between each data value and the mean.

If the data set is a sample, the variance is denoted by s2.

If the data set is a population, the variance is denoted by 2.

sxi x

n2

2

1

( )

22

( )xNi

Page 25: Statistics for Business and Economics: bab 3

25 25 Slide

Slide

Standard Deviation

The standard deviation of a data set is the positive square root of the variance.

It is measured in the same units as the data, making it more easily comparable, than the variance, to the mean.

If the data set is a sample, the standard deviation is denoted s.

If the data set is a population, the standard deviation is denoted (sigma).

s s 2

2

Page 26: Statistics for Business and Economics: bab 3

26 26 Slide

Slide

Coefficient of Variation

The coefficient of variation indicates how large the standard deviation is in relation to the mean.

If the data set is a sample, the coefficient of variation is computed as follows:

If the data set is a population, the coefficient of variation is computed as follows:

sx

( )100

( )100

Page 27: Statistics for Business and Economics: bab 3

27 27 Slide

Slide

Example: Apartment Rents

Variance

Standard Deviation

Coefficient of Variation

sxi x

n2

2

12 996 16

( ), .

s s 2 2996 47 54 74. .

sx 100

54 74490 80

100 11 15..

.

Page 28: Statistics for Business and Economics: bab 3

28 28 Slide

Slide

Measures of Relative Locationand Detecting Outliers

z-Scores Chebyshev’s Theorem Empirical Rule Detecting Outliers

Page 29: Statistics for Business and Economics: bab 3

29 29 Slide

Slide

z-Scores

The z-score is often called the standardized value.

It denotes the number of standard deviations a data value xi is from the mean.

A data value less than the sample mean will have a z-score less than zero.

A data value greater than the sample mean will have a z-score greater than zero.

A data value equal to the sample mean will have a z-score of zero.

zx xsii

Page 30: Statistics for Business and Economics: bab 3

30 30 Slide

Slide

z-Score of Smallest Value (425)

Standardized Values for Apartment Rents

zx xsi

425 490 80

54 741 20

..

.

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

Example: Apartment Rents

Page 31: Statistics for Business and Economics: bab 3

31 31 Slide

Slide

Chebyshev’s Theorem

At least (1 - 1/k2) of the items in any data set will be

within k standard deviations of the mean, where k is

any value greater than 1.• At least 75% of the items must be within

k = 2 standard deviations of the mean.

• At least 89% of the items must be withink = 3 standard deviations of the

mean.• At least 94% of the items must be within

k = 4 standard deviations of the mean.

At least (1 - 1/k2) of the items in any data set will be

within k standard deviations of the mean, where k is

any value greater than 1.• At least 75% of the items must be within

k = 2 standard deviations of the mean.

• At least 89% of the items must be withink = 3 standard deviations of the

mean.• At least 94% of the items must be within

k = 4 standard deviations of the mean.

Page 32: Statistics for Business and Economics: bab 3

32 32 Slide

Slide

Example: Apartment Rents

Chebyshev’s Theorem

Let k = 1.5 with = 490.80 and s = 54.74

At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56%

of the rent values must be between - k(s) = 490.80 - 1.5(54.74) =

409 and

+ k(s) = 490.80 + 1.5(54.74) = 573

x

x

x

Page 33: Statistics for Business and Economics: bab 3

33 33 Slide

Slide

Chebyshev’s Theorem (continued) Actually, 86% of the rent values

are between 409 and 573.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Example: Apartment Rents

Page 34: Statistics for Business and Economics: bab 3

34 34 Slide

Slide

Empirical Rule

For data having a bell-shaped distribution:

• Approximately 68% of the data values will be within one standard deviation of the mean.

Page 35: Statistics for Business and Economics: bab 3

35 35 Slide

Slide

Empirical Rule

For data having a bell-shaped distribution:

• Approximately 95% of the data values will be within two standard deviations of the mean.

Page 36: Statistics for Business and Economics: bab 3

36 36 Slide

Slide

Empirical Rule

For data having a bell-shaped distribution:

• Almost all (99.7%) of the items will be within three standard deviations of the mean.

Page 37: Statistics for Business and Economics: bab 3

37 37 Slide

Slide

Example: Apartment Rents

Empirical Rule Interval % in

IntervalWithin +/- 1s 436.06 to 545.54 48/70 = 69%Within +/- 2s 381.32 to 600.28 68/70 = 97%Within +/- 3s 326.58 to 655.02 70/70 = 100%

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 38: Statistics for Business and Economics: bab 3

38 38 Slide

Slide

Detecting Outliers

An outlier is an unusually small or unusually large value in a data set.

A data value with a z-score less than -3 or greater than +3 might be considered an outlier.

It might be an incorrectly recorded data value. It might be a data value that was incorrectly

included in the data set. It might be a correctly recorded data value

that belongs in the data set !

Page 39: Statistics for Business and Economics: bab 3

39 39 Slide

Slide

Example: Apartment Rents

Detecting OutliersThe most extreme z-scores are -1.20 and

2.27.Using |z| > 3 as the criterion for an

outlier, there are no outliers in this data set.

Standardized Values for Apartment Rents-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

Page 40: Statistics for Business and Economics: bab 3

40 40 Slide

Slide

Exploratory Data Analysis

Five-Number Summary Box Plot

Page 41: Statistics for Business and Economics: bab 3

41 41 Slide

Slide

Five-Number Summary

Smallest Value First Quartile Median Third Quartile Largest Value

Page 42: Statistics for Business and Economics: bab 3

42 42 Slide

Slide

Example: Apartment Rents

Five-Number SummaryLowest Value = 425 First Quartile

= 450 Median = 475

Third Quartile = 525 Largest Value = 615425 430 430 435 435 435 435 435 440 440

440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 43: Statistics for Business and Economics: bab 3

43 43 Slide

Slide

Box Plot

A box is drawn with its ends located at the first and third quartiles.

A vertical line is drawn in the box at the location of the median.

Limits are located (not drawn) using the interquartile range (IQR).• The lower limit is located 1.5(IQR) below Q1.• The upper limit is located 1.5(IQR) above

Q3.• Data outside these limits are considered

outliers.… continued

Page 44: Statistics for Business and Economics: bab 3

44 44 Slide

Slide

Box Plot (Continued)

Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits.

The locations of each outlier is shown with the

symbol * .

Page 45: Statistics for Business and Economics: bab 3

45 45 Slide

Slide

Example: Apartment Rents

Box Plot

Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75) = 337.5

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5

There are no outliers.

375375

400400

425425

450450

475475

500500

525525

550550 575575 600600 625625

Page 46: Statistics for Business and Economics: bab 3

46 46 Slide

Slide

Measures of Association Between Two Variables

Covariance Correlation Coefficient

Page 47: Statistics for Business and Economics: bab 3

47 47 Slide

Slide

Covariance

The covariance is a measure of the linear association between two variables.

Positive values indicate a positive relationship. Negative values indicate a negative

relationship.

Page 48: Statistics for Business and Economics: bab 3

48 48 Slide

Slide

If the data sets are samples, the covariance is denoted by sxy.

If the data sets are populations, the covariance is denoted by .

Covariance

sx x y ynxy

i i

( )( )

1

xyi x i yx y

N

( )( )

xy

Page 49: Statistics for Business and Economics: bab 3

49 49 Slide

Slide

Correlation Coefficient

The coefficient can take on values between -1 and +1.

Values near -1 indicate a strong negative linear relationship.

Values near +1 indicate a strong positive linear relationship.

If the data sets are samples, the coefficient is rxy.

If the data sets are populations, the coefficient is .

rs

s sxyxy

x y

xyxy

x y xy

Page 50: Statistics for Business and Economics: bab 3

50 50 Slide

Slide

The Weighted Mean andWorking with Grouped Data

Weighted Mean Mean for Grouped Data Variance for Grouped Data Standard Deviation for Grouped Data

Page 51: Statistics for Business and Economics: bab 3

51 51 Slide

Slide

Weighted Mean

When the mean is computed by giving each data value a weight that reflects its importance, it is referred to as a weighted mean.

In the computation of a grade point average (GPA), the weights are the number of credit hours earned for each grade.

When data values vary in importance, the analyst must choose the weight that best reflects the importance of each value.

Page 52: Statistics for Business and Economics: bab 3

52 52 Slide

Slide

Weighted Mean

x = wi xi

wi

where: xi = value of observation i

wi = weight for observation i

Page 53: Statistics for Business and Economics: bab 3

53 53 Slide

Slide

Grouped Data

The weighted mean computation can be used to obtain approximations of the mean, variance, and standard deviation for the grouped data.

To compute the weighted mean, we treat the midpoint of each class as though it were the mean of all items in the class.

We compute a weighted mean of the class midpoints using the class frequencies as weights.

Similarly, in computing the variance and standard deviation, the class frequencies are used as weights.

Page 54: Statistics for Business and Economics: bab 3

54 54 Slide

Slide

Sample Data

Population Data

where: fi = frequency of class i

Mi = midpoint of class i

Mean for Grouped Data

i

ii

f

Mfx

N

Mf ii

Page 55: Statistics for Business and Economics: bab 3

55 55 Slide

Slide

Example: Apartment Rents

Given below is the previous sample of monthly rents

for one-bedroom apartments presented here as grouped

data in the form of a frequency distribution.

Rent ($) Frequency420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Page 56: Statistics for Business and Economics: bab 3

56 56 Slide

Slide

Example: Apartment Rents

Mean for Grouped Data

This approximation differs by $2.41 from

the actual sample mean of $490.80.

Rent ($) f i M i f iM i

420-439 8 429.5 3436.0440-459 17 449.5 7641.5460-479 12 469.5 5634.0480-499 8 489.5 3916.0500-519 7 509.5 3566.5520-539 4 529.5 2118.0540-559 2 549.5 1099.0560-579 4 569.5 2278.0580-599 2 589.5 1179.0600-619 6 609.5 3657.0

Total 70 34525.0

x 34 525

70493 21

,.

Page 57: Statistics for Business and Economics: bab 3

57 57 Slide

Slide

Variance for Grouped Data

Sample Data

Population Data

sf M xn

i i22

1

( )

22

f M

Ni i( )

Page 58: Statistics for Business and Economics: bab 3

58 58 Slide

Slide

Example: Apartment Rents

Variance for Grouped Data

Standard Deviation for Grouped Data

This approximation differs by only $.20 from the actual standard deviation of $54.74.

s2 3 017 89 , .

s 3 017 89 54 94, . .

Page 59: Statistics for Business and Economics: bab 3

59 59 Slide

Slide

End of Chapter 3