central tendency, dispersion, correlation and regression analysis

12
Business Statistic Central Tendency, Dispersion, Correlation and Regression Analysis Case study for MBA program CHUOP Theot Therith 12/29/2010

Upload: others

Post on 03-Feb-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Business Statistic

Central Tendency, Dispersion, Correlation and Regression Analysis Case study for MBA program

CHUOP Theot Therith 12/29/2010

Business Statistic

Prepared by: CHUOP Theot Therith 1

SOLUTION

1. CENTRAL TENDENCY

- The different measures of Central Tendency are:

(1). Arithmetic Mean (AM)

(2). Median

(3). Mode

(4). Geometric Mean (GM)

(5). Harmonic Mean (HM)

- The uses of different measures of Central Tendency are as following:

Depends upon three considerations:

1. The concept of a typical value required by the problem.

2. The type of data available.

3. The special characteristics of the averages under consideration.

• If it is required to get an average based on all values, the arithmetic mean or geometric mean

or harmonic mean should be preferable over median or mode.

• In case middle value is wanted, the median is the only choice.

• To determine the most common value, mode is the appropriate one.

• If the data contain extreme values, the use of arithmetic mean should be avoided.

• In case of averaging ratios and percentages, the geometric mean and in case of averaging the

rates, the harmonic mean should be preferred.

• The frequency distributions in open with open-end classes prohibit the use of arithmetic

mean or geometric mean or harmonic mean.

Business Statistic

Prepared by: CHUOP Theot Therith 2

• If the distribution is bell-shaped and symmetrical or flat-topped one with few extreme

values, the arithmetic mean is the best choice, because it is less affected by sampling

fluctuations.

• If the distribution is sharply peaked, i.e., the data cluster markedly at the middle or if there

are abnormally small or large values, the median has smaller sampling fluctuations than the

arithmetic mean.

• The arithmetic mean should ordinarily be used, because, it is simple, rigidly defined, based

on all observations and amenable to further statistical treatment, unless nature of data

strongly prohibits its use.

Conclusion to choose of an Average:

- Arithmetic Mean (AM): is used in generally

- Median: is used when data are extremely value

- Mode: is used to find the most common use/need/demand…

- Geometric Mean (GM): is used to find the average of ratio/percentage

- Harmonic Mean (HM): is used to find the average speed.

2. EXPLAIN THE DIFFERENCE BETWEEN ABSOLUTE AND RELATIVE MEASURES OF

DISPERSION

Absolute and Relative Measures of Variation:

• Absolute measures of dispersion are expressed in the same statistical unit in which the

original data are given such as riels, kilograms, tones, etc. These values may be used to

compare the variation in two distributions provided the variables are expressed in the same

units and of the same average size.

Business Statistic

Prepared by: CHUOP Theot Therith 3

• In case the two sets of data are expressed in different units, such as quintals of sugar versus

tones of sugarcane or if the average size is very different such as managers’ salary versus

workers’ salary the relative measures of dispersion should be used.

3. COMPUTE THE SAMPLE ARITHMETIC MEAN:

Time in second

Lower limit

Upper limit

Number of Customers (f)

Mid-point (m) fm

20 -29 20 29 60 24.5 1470

30 -39 30 39 160 34.5 5520

40 -49 40 49 210 44.5 9345

50 -59 50 59 290 54.5 15805

60 -69 60 69 250 64.5 16125

70 -79 70 79 220 74.5 16390

80 -89 80 89 110 84.5 9295

90 -99 90 99 70 94.5 6615

100 -109 100 109 40 104.5 4180

110 -119 110 119 10 114.5 1145

120 -129 120 129 20 124.5 2490

N=∑f=1440 ∑f m= 88380

By formula: 375.611440

88380

N

fmX

Interpret the result: in generally, cashiers need 61.375 seconds (around 62 seconds) in average to

serve each customer.

4. THE MEDIAN AND MODAL INCOMES

Income (in $) Number of Households c.f.

Less than 2000 151 151

2000 up to 3000 183 334

3000 up to 4000 212 546

4000 up to 5000 184 730

5000 up to 6000 157 887

6000 and greater 113 1000

N= 1000

a. Find the median incomes

- Median class = size of N/2 th

item = 1000/2=500

- Median lies in the class of 3000 up to 4000

Business Statistic

Prepared by: CHUOP Theot Therith 4

if

fcN

LMedian

..

2

Where L = 3000, the lower limit of the median class

N = 1000, total number of households (total frequency)

f = 212, households’ number (frequency) of median class

c.f. = 334, cumulative frequency of the class preceding the median class

i = 1000, the class interval of the median class (4000-3000)

Hence,

$37833783.01891000212

3342

500

3000

Median

Therefore, the median of households’ incomes is 3783 dollars

b. Find the modal incomes

The highest frequency (number of households) is 212, so the modal class is 3000-4000.

By formula:

iLMo

21

1

Where L = 3000, the lower limit of the modal class

1 = 212 – 183 = 29

2 = 212 – 184 = 28

i = 1000

Hence, 77.3508$10002829

293000

Mo

Therefore, the modal of households’ incomes is 3508.77 dollars

Business Statistic

Prepared by: CHUOP Theot Therith 5

5. THE FOLLOWING DATA ARE THE ESTIMATED MARKET VALUES (IN $ MILLIONS)

OF 50 COMPANIES IN THE AUTO PARTS BUSINESS.

Nº x xxi 2xxi

1 26.8 9.642 92.968164

2 28.3 11.142 124.144164

3 11.7 -5.458 29.789764

4 6.7 -10.458 109.369764

5 6.1 -11.058 122.279364

6 8.6 -8.558 73.239364

7 15.5 -1.658 2.748964

8 18.5 1.342 1.800964

9 31.4 14.242 202.834564

10 0.9 -16.258 264.322564

11 6.5 -10.658 113.592964

12 31.4 14.242 202.834564

13 6.8 -10.358 107.288164

14 30.4 13.242 175.350564

15 9.6 -7.558 57.123364

16 30.6 13.442 180.687364

17 23.4 6.242 38.962564

18 22.3 5.142 26.440164

19 20.6 3.442 11.847364

20 35 17.842 318.336964

21 15.4 -1.758 3.090564

22 4.3 -12.858 165.328164

23 12.9 -4.258 18.130564

24 5.2 -11.958 142.993764

25 17.1 -0.058 0.003364

26 18 0.842 0.708964

27 20.2 3.042 9.253764

28 29.8 12.642 159.820164

29 37.8 20.642 426.092164

30 1.9 -15.258 232.806564

31 7.6 -9.558 91.355364

32 33.5 16.342 267.060964

33 1.3 -15.858 251.476164

34 13.4 -3.758 14.122564

35 1.2 -15.958 254.657764

36 21.5 4.342 18.852964

37 7.9 -9.258 85.710564

Business Statistic

Prepared by: CHUOP Theot Therith 6

38 14.1 -3.058 9.351364

39 18.3 1.142 1.304164

40 16.6 -0.558 0.311364

41 11 -6.158 37.920964

42 11.2 -5.958 35.497764

43 29.7 12.542 157.301764

44 27.1 9.942 98.843364

45 31.1 13.942 194.379364

46 10.2 -6.958 48.413764

47 1 -16.158 261.080964

48 18.7 1.542 2.377764

49 32.7 15.542 241.553764

50 16.1 -1.058 1.119364

8818.5486

2 xx

a. Determine the standard deviation of the market values.

By formula:

N

x

2

Where 158.1750

9.857

50

... 5021

xxx

N

x

And according to the table above, the standard deviation

475573.1050

8818.54862

N

x (in million dollar)

Therefore the standard deviation of the market values is 10.47 (million dollars)

b. Determine the coefficient of variation.

%05536.61100158.17

475573.10100..

xVC

Therefore the coefficient of variation is C.V. = 61.05536%

Business Statistic

Prepared by: CHUOP Theot Therith 7

6. DETERMINE KARL PEARSON’S COEFFICIENT OF CORRELATION

Year R&D spent

( x )

Annual Profit ( y )

xx yy 2xx 2xx ))(( yyxx

2000 2 20 -4.1 -12.8 16.81 163.84 52.48

2001 3 25 -3.1 -7.8 9.61 60.84 24.18

2002 5 34 -1.1 1.2 1.21 1.44 -1.32

2003 4 30 -2.1 -2.8 4.41 7.84 5.88

2004 11 40 4.9 7.2 24.01 51.84 35.28

2005 5 31 -1.1 -1.8 1.21 3.24 1.98

2006 6 35 -0.1 2.2 0.01 4.84 -0.22

2007 8 36 1.9 3.2 3.61 10.24 6.08

2008 7 38 0.9 5.2 0.81 27.04 4.68

2009 10 39 3.9 6.2 15.21 38.44 24.18

x

=61

y

=328

2

xx

=76.9

2

yy

=369.6

yyxx

=153.2

By formula

22)(

))((

yyxx

yyxxr

Where

20.153

60.369

90.76

8.3210

328

1.610

61

2

2

yyxx

yy

xx

N

yy

N

xx

Hence,

9087.060.36990.76

20.153

r

Therefore coefficient of correlation is r = 0.9087

Business Statistic

Prepared by: CHUOP Theot Therith 8

- Explain the relationship between the amount spent on R&D and profit of the company.

The value of correlation coefficient r = 0.9087, it indicates that the relationship between the

amount spent on R&D and profit of the company is high degree of positive correlation. Means, the

company should spend more on R&D to get more its annual profit.

7. CORRELATION AND REGRESSION, SCATTER DIAGRAM.

- The difference between correlation and regression Correlation: is a statistical tool, which studies or measures the relationship between two

variables. It enables us to have an idea about the degree and direction of the relationship between the

two variables under study. Examples, the relationship between advertisement expense and sales,

amount spend on R&D and annual profit.

Regression: is another one important statistical tools, which studies or measures the impact of

one variable to other. It means the estimation or the prediction of the unknown value of one variable

from the known value of the other variable. Examples, the impact of the advertisement expense to

sales, the estimation of earnings from sales.

- The scatter diagram: the scatter diagram is the diagrammatic of bivariate data. It only tells us about

the nature of the relationship whether it is positive or negative and whether it is high or low. It does not

provide us an exact measure of the extent of the relationship between the two variables.

Below are the explanations through the scatter diagrams “Graphic”:

x x

y y

(a) (b) x

Business Statistic

Prepared by: CHUOP Theot Therith 9

1. Picture (a) indicates the correlation is perfect and positive because all the points lie on a

straight line starting from the left bottom and going up towards the right top. It is perfect

positive correlation, means, 100% increase/decrease of x ==> 100% increase/decrease of

y (this case coefficient of correlation is r = +1). Picture (b) indicates the correlation is

perfect and negative because all the points lie on a straight line starting from the left top

and coming down to the right bottom. It is perfect negative correlation, means, 100%

increase/decrease of x ==> 100% decrease/increase of y (this case coefficient of

correlation is r = -1)

2. Picture (c) shows the correlation is positive since this reveals that the values of the two

variables move in the same direction because the plotted points reveal an upward trend

rising from lower left hand corner and going upward to the upper right hand corner. If x

increases/decreases ==> y increases/decreases. Picture (d) shows the correlation is

negative since in this case the values of the two variables move in the opposite direction

because the points depict a downward trend from the upper left hand corner to the lower

right hand corner. If x increases/decreases ==> y decreases/increases.

3. If the points are very dense, i.e., very close to each other, a fairly good amount of

correlation may be expected between the two variables. If the points are widely scattered,

a poor correlation may be expected between them.

y y

x x (c) (d)

x

Business Statistic

Prepared by: CHUOP Theot Therith 10

8. REGRESSION ANALYSIS

a. Determine the regression equation

Company Sales X

($ millions) Earnings Y ($ millions)

xx yy 2xx 2xx ))(( yyxx

Lucky 89.200 4.900 47.442 -0.442 2250.712 0.195 -20.953

KFC 28.600 6.000 -13.158 0.658 173.142 0.433 -8.663

Mekong Bus 18.200 1.300 -23.558 -4.042 554.995 16.335 95.215

Sorya Bus 69.200 12.800 27.442 7.458 753.045 55.627 204.669

Bayon Bakery 17.500 2.600 -24.258 -2.742 588.467 7.517 66.508

Apsara Bakery 11.900 1.700 -29.858 -3.642 891.520 13.262 108.734

Tiger Beer 71.700 8.000 29.942 2.658 896.503 7.067 79.595

Angkor Beer 58.600 6.600 16.842 1.258 283.642 1.583 21.192

Pizza World 19.600 3.500 -22.158 -1.842 490.992 3.392 40.808

Master Roll 18.600 4.400 -23.158 -0.942 536.308 0.887 21.807

Akira 51.200 8.200 9.442 2.858 89.145 8.170 26.987

Nokia 46.800 4.100 5.042 -1.242 25.418 1.542 -6.260

x

=501.100

y

=64.100

2

xx

=7533.889

2

yy

=116.009

yyxx

=629.641

According to the table above,

641.629

009.116

889.7533

342.512

1.64

758.4112

1.501

2

2

yyxx

yy

xx

N

yy

N

xx

Find the coefficients:

42751.5

009.116

641.629

08357.0889.7533

641.629

22

22

yy

yyxx

dy

dxdyb

xx

yyxx

dx

dxdyb

xy

yx

Business Statistic

Prepared by: CHUOP Theot Therith 11

Hence,

Equation of line of regression of earnings on sales (y on x):

8513.108357.0

4897.3341.508357.0

)758.41(08357.0341.5

)(

xy

xy

xy

xxbyy yx

Therefore Equation of line of regression of earnings on sales (y on x): y = 0.08357x + 1.8513

Equation of line of regression of sales on earnings (x on y):

766.1242751.5

99195.28758.4142751.5

)342.5(42751.5758.41

)(

yx

yx

yx

yybxx xy

Therefore Equation of line of regression of sales on earnings (x on y): 766.1242751.5 yx

b. Estimate the earnings for a small company with $50.0 million in sales

As equation of line of regression of earnings on sales (y on x): 8513.108357.0 xy so the earnings is

0297.68513.15008357.0 y (in million dollars)

Therefore the earning of that company with $50.0 million in sales is $6.0297 million.