module 5-measures-of-central-tendency

34
Training Course on Basic Statistical Analysis Using MS Excel 2007 March 28 to April 1, 2011 STATISTICAL RESEARCH AND TRAINING CENTER J and S Building, 104 Kalayaan Avenue, Diliman, Quezon City Measures of Central Tendency Prepared by: Prof. Josefina V. Almeda College Secretary School of Statistics University of the Philippines, Diliman 2011

Upload: michael-rodulfo

Post on 23-Jan-2018

107 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Module 5-measures-of-central-tendency

Training Course on Basic Statistical Analysis Using MS Excel 2007

March 28 to April 1, 2011

STATISTICAL RESEARCH AND TRAINING CENTERJ and S Building, 104 Kalayaan Avenue, Diliman, Quezon City

Measures of Central Tendency

Prepared by:

Prof. Josefina V. Almeda

College Secretary

School of Statistics

University of the Philippines, Diliman

2011

Page 2: Module 5-measures-of-central-tendency

2

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

Measures of Central Tendency

OUTLINE

Mean

Median

Mode

Page 3: Module 5-measures-of-central-tendency

3

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

Central Tendency

Mean

Median

Mode

Other Locations

Summary Measures

Variation

Variance

Standard Deviation

Coefficient of

VariationRange

Quartiles

Describing Data with Summary Measures

Page 4: Module 5-measures-of-central-tendency

4

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

Measures of Central Tendency

Measure of central tendency is an index

of the central location of a distribution.

It is a single value that is used to

identify the “center” of the data or the

typical value.

Precise yet simple

Most representative value of the data

Page 5: Module 5-measures-of-central-tendency

5

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

The arithmetic mean is the sum of all observed

values divided by the total number of observations.

The population mean for a finite population with N

elements, denoted by the Greek letter (lowercase Greek letter mu), is

The Arithmetic Mean

N

XN

i

i 1

Page 6: Module 5-measures-of-central-tendency

6

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

The sample mean for a finite sample with n

elements, denoted by X

The population mean is a parameter while the sample mean is

a statistic.

n

X

X

n

i

i 1

Page 7: Module 5-measures-of-central-tendency

7

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

1. Given the number of children of a sample of 10

currently married women: 3, 4, 2, 5, 1, 3, 4, 2, 3, 3,

find the mean number of children of the currently

married women.

Solution: We compute for the sample mean.

310

3324315243

X

The mean number of children of currently married

women is 3.

Examples of Arithmetic Mean

Page 8: Module 5-measures-of-central-tendency

8

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

2. Given the incidence of alleged human rights violations

by region for the year 2004, find the mean incidence of

alleged human rights violations.

NCR 133

CAR 11

Region1 2

Region 2 16

Region 3 41

Region 4 57

Region 5 30

Region 6 49

Region 7 44

Page 9: Module 5-measures-of-central-tendency

9

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

Region 8 71

Region 9 73

Region 10 26

Region 11 258

Region 12 49

Region 13 39

Solution: We get the population mean incidence of

alleged human rights violations

3.5915

899

The mean incidence of alleged human rights

violations per region is 59.3.

Page 10: Module 5-measures-of-central-tendency

10

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

1. The mean is the most common measure of central

tendency since it employs every observed value in

the calculation.

2. It may or may not be an actual observed value in

the data set.

3. We may compute the mean for both ungrouped and

grouped data sets.

4. Extreme observations affect the value of the mean

especially if the number of observations is small.

Properties of the Mean

Page 11: Module 5-measures-of-central-tendency

11

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

5. The value of the mean always exists and unique.

6. It is a widely understood measure of central tendency.

7. We use the mean if the distribution is not so asymmetrical;

when we give equal importance to the effect of all

observed values; and when we compute other statistics

later on.

Page 12: Module 5-measures-of-central-tendency

12

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

* if the individual values do not have equal

importance, then we compute for the weighted

mean.

* We assign weights to the observed values of

the data set before we can get the weighted

mean.

The Weighted Mean

Page 13: Module 5-measures-of-central-tendency

13

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

If we assign a weight iW to each observation iX

where i = 1, 2,…, n, and n is the number of observations

in the sample, then the weighted sample mean is given by

n21

nn2211

n

1i

i

n

1i

ii

wW...WW

XW...XWXW

W

XW

X

Formula of Weighted Mean

Page 14: Module 5-measures-of-central-tendency

14

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

Suppose a government agency gives scholarship grants to

employees taking graduate studies. Courses in graduate

studies

earn credits of 1, 2, 3, 4, or 5 units. They can get a partial

scholarship for the next semester if they get a weighted

average

of 1.5 to 1.75 and a full scholarship if the average is better

than

1.5, which means an average of 1.0 to 1.49. What kind of

scholarship will the 2 employees get given their grades for

the previous semester?

Example of Weighted Mean

Page 15: Module 5-measures-of-central-tendency

15

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

Employee A Employee B

Subjects Units Grade Subjects Units Grade

A 1 1.0 A 1 2.0

B 2 1.25 B 2 1.75

C 3 1.5 C 3 1.5

D 4 1.75 D 4 1.25

E 5 2.0 E 5 1.0

Consider the grades of the two employees in the previous

semester:

Page 16: Module 5-measures-of-central-tendency

16

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

We let the units be the weights Wi and the grade is the Xi.

Weighted average of employee A:

67.115

25

54321

)2(5)75.1(4)5.1(3)25.1(2)1(1

wX

Weighted average of employee B:

33.115

20

54321

)0.1(5)25.1(4)5.1(3)75.1(2)2(1

wX

Thus, employee A will get a partial scholarship

while employee B will get a full scholarship.

Solution:

Page 17: Module 5-measures-of-central-tendency

17

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

We can obtain the mean of several data sets given the

means and number of observations of each data set. This

is what we call the combined mean. Suppose that k finite

populations having measurements,

respectively, have means

The combined population mean, c of all the populations is

k21

kk2211

k

1i

i

k

1i

ii

cN...NN

μN...μNμN

N

μN

μ

k21 .,N..,,NN

k21 ,μ...,,μμ

The Combined Population Mean

Page 18: Module 5-measures-of-central-tendency

18

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

If random samples of size , selected from these

k populations, have the means respectively, the combined

sample mean of all the sample data iscX

k

kk

k

i

i

k

i

ii

cnnn

XnXnXn

n

Xn

X

...

...

21

2211

1

1

knnn ,...,, 21

Page 19: Module 5-measures-of-central-tendency

19

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

males = 376.8 females = 309.2

Thus,6028(376.8) 4948(309.2)

3466028 4948

both sexes

Example of the Combined Mean

The Philippines have 6028 male children deaths and

4948 female children deaths for the age group 1-4 in

2002. The average number of deaths for male and

female children is 376.8 and 309.2. What is the

combined population mean for both sexes?

The average number of deaths for children 1-4 years old

for both sexes is 346.

Solution: We let = 6028 and N2 = 4948.

1N

1N

Page 20: Module 5-measures-of-central-tendency

20

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

* It divides an ordered observation into two equal parts so

that half of the observations are below its value and the

other half are above its value.

* It is the positional middle of the array.

Example: If the median annual family income of 500 families

is P185,000, then this implies that half of the 500

families (250 families) have annual family income

lower than P185,000 and the other half (250

families) have annual family income higher than

P185,000.

The Median

Page 21: Module 5-measures-of-central-tendency

21

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

* The first step in finding the median, denoted by Md, is to

arrange the observations in an array.

Case 1: If the number of observations n is odd, the median

is the middle observed value in the array.

Computation of the Median

Case 2: If the number of observations n is even, the median

is the average of the two middle observed values in

the array.

Page 22: Module 5-measures-of-central-tendency

22

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

1. The annual per capita poverty threshold in pesos of the

different regions of the Philippines are as follows: 15,693,

13,066, 12,685, 11,128 13,760, 13,657, 11,995, 11,372,

11,313, 9,656, 9,518, 9,116, 10,503, 10,264, 10,466,

10,896, 12,192.

Solution: We arrange the 17 annual per capita poverty threshold

in pesos of the 17 regions of the Philippines from

lowest to highest.

Examples of the Median

Page 23: Module 5-measures-of-central-tendency

23

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

Since n = 17 is odd, the median is the middle observed value in

the array. That is the median is P11,313.00.

Interpretation: Half of the 17 regions have annual per capita

poverty threshold of P11,313 and the other half

have annual per capita poverty threshold higher

than P11,313 pesos.

Array: 9116, 9518, 9656, 10264, 10466, 10503, 10896, 11128,

11313, 11372, 11995, 12192, 12,685, 13066, 13657,

13760, 15693

Page 24: Module 5-measures-of-central-tendency

Array: 33315, 35945, 42860, 82616, 94079, 117116, 125517,

147513, 151650, 190335, 295334, 410841, 427497,

470299, 1049413, 2799079 n = 16 is even

5.1495812

151650147513 Md

Interpretation: 50% of the 16 regions have number of telephone

lines less than 149581.5 and the upper 50% have

number of telephone lines more than 149581.5.

2. The following are the number of telephone lines of 16 regions

for the year 2004: 2799079, 94079, 190335, 42860,

410841, 1049413, 125157, 427497, 470299, 151652,

35945, 147513, 295334, 82616, 117116, 33315. Find the

median.

Page 25: Module 5-measures-of-central-tendency

25

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

Median

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5

Page 26: Module 5-measures-of-central-tendency

26

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

1. The median is a positional measure. This implies that

extreme values affect the median less than the mean.

2. We use the median as a measure of central tendency if we

wish the exact middle value of the distribution, when there

are extreme observed values, and when the frequency

distribution table has open-ended class intervals.

Characteristics of the Median

Page 27: Module 5-measures-of-central-tendency

27

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

* is the observed value that occurs with the greatest

frequency in a data set.

* determine the mode by counting the frequency of

each observed value and finding the observed value

with the highest frequency of occurrence.

* Generally, the mode is a less popular measure of

central tendency as compared to the mean and the

median.

The Mode

Page 28: Module 5-measures-of-central-tendency

28

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

1. Given the data on number of children of 12 currently

married women: 2, 2, 1, 1, 1, 3, 3, 4, 4, 2, 2, 2. Find the

mode.

By inspection, the mode is 2.

Interpretation: The most frequent number of children among

the 12 currently married women is 2.

2. Given the data on number of cases resolved by a 10 lawyers:

5, 4, 1, 1, 3, 3, 2, 1, 3, 0. Find the mode.

The modes are 1 and 3.

Examples of Mode

Page 29: Module 5-measures-of-central-tendency

29

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

3. Given the data on number of cases handled by 14

PAO lawyers : 629, 645, 356, 656, 231, 455, 412,

289, 444, 452, 642, 225, 335, 411. Find the mode.

Page 30: Module 5-measures-of-central-tendency

30

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

Mode

occurs most frequently

may or may not exist

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No Mode

Page 31: Module 5-measures-of-central-tendency

31

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

Characteristics of the Mode

1. The mode gives the most typical value of a set of observations.

2. Few low or high values do not easily affect the mode.

3. The mode is sometimes not unique and does not exist.

4. We can have several modes for one data set. If there is one mode, it is unimodal. If there are two modes, we call it bimodal. If there are more than two modes, then we call it multimodal.

5. The value of the mode is always one of the observed values in the data set.

6. We can get the mode for both quantitative and qualitative types of data.

Page 32: Module 5-measures-of-central-tendency

32

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

Given the number of cellular mobile telephone subscribers for

the year 2001, what is the mode?

Telephone Operator Number of Subscribers

EXTELCOM 194,452

GLOBE TELECOM 5,405,415

ISLACOM 181,614

PILTEL 1,483,838

SMART 4,893,844

Example of Mode for Qualitative Data

Page 33: Module 5-measures-of-central-tendency

33

Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007

March 28 – April 1, 2011

* In performing calculations, we only round-off the final

answer and not the transitional values.

* The final answer should increase by one digit of the

original observations.

Example: The mean of the data set 3, 4, and 6 is 4.3333333333…..

Round this figure to the nearest tenth since the original observed values

are whole numbers. Thus, the mean becomes 4.3.

Example: If the original observed values have one decimal place like

4.5, 6.3, 7.7, 8.9, then we round the final answer to two decimal places.

Thus, if we get the mean, the final answer is 6.85.

Round-Off Rule

Page 34: Module 5-measures-of-central-tendency

Thank you.