module 5-measures-of-central-tendency
TRANSCRIPT
Training Course on Basic Statistical Analysis Using MS Excel 2007
March 28 to April 1, 2011
STATISTICAL RESEARCH AND TRAINING CENTERJ and S Building, 104 Kalayaan Avenue, Diliman, Quezon City
Measures of Central Tendency
Prepared by:
Prof. Josefina V. Almeda
College Secretary
School of Statistics
University of the Philippines, Diliman
2011
2
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
Measures of Central Tendency
OUTLINE
Mean
Median
Mode
3
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
Central Tendency
Mean
Median
Mode
Other Locations
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of
VariationRange
Quartiles
Describing Data with Summary Measures
4
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
Measures of Central Tendency
Measure of central tendency is an index
of the central location of a distribution.
It is a single value that is used to
identify the “center” of the data or the
typical value.
Precise yet simple
Most representative value of the data
5
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
The arithmetic mean is the sum of all observed
values divided by the total number of observations.
The population mean for a finite population with N
elements, denoted by the Greek letter (lowercase Greek letter mu), is
The Arithmetic Mean
N
XN
i
i 1
6
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
The sample mean for a finite sample with n
elements, denoted by X
The population mean is a parameter while the sample mean is
a statistic.
n
X
X
n
i
i 1
7
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
1. Given the number of children of a sample of 10
currently married women: 3, 4, 2, 5, 1, 3, 4, 2, 3, 3,
find the mean number of children of the currently
married women.
Solution: We compute for the sample mean.
310
3324315243
X
The mean number of children of currently married
women is 3.
Examples of Arithmetic Mean
8
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
2. Given the incidence of alleged human rights violations
by region for the year 2004, find the mean incidence of
alleged human rights violations.
NCR 133
CAR 11
Region1 2
Region 2 16
Region 3 41
Region 4 57
Region 5 30
Region 6 49
Region 7 44
9
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
Region 8 71
Region 9 73
Region 10 26
Region 11 258
Region 12 49
Region 13 39
Solution: We get the population mean incidence of
alleged human rights violations
3.5915
899
The mean incidence of alleged human rights
violations per region is 59.3.
10
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
1. The mean is the most common measure of central
tendency since it employs every observed value in
the calculation.
2. It may or may not be an actual observed value in
the data set.
3. We may compute the mean for both ungrouped and
grouped data sets.
4. Extreme observations affect the value of the mean
especially if the number of observations is small.
Properties of the Mean
11
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
5. The value of the mean always exists and unique.
6. It is a widely understood measure of central tendency.
7. We use the mean if the distribution is not so asymmetrical;
when we give equal importance to the effect of all
observed values; and when we compute other statistics
later on.
12
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
* if the individual values do not have equal
importance, then we compute for the weighted
mean.
* We assign weights to the observed values of
the data set before we can get the weighted
mean.
The Weighted Mean
13
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
If we assign a weight iW to each observation iX
where i = 1, 2,…, n, and n is the number of observations
in the sample, then the weighted sample mean is given by
n21
nn2211
n
1i
i
n
1i
ii
wW...WW
XW...XWXW
W
XW
X
Formula of Weighted Mean
14
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
Suppose a government agency gives scholarship grants to
employees taking graduate studies. Courses in graduate
studies
earn credits of 1, 2, 3, 4, or 5 units. They can get a partial
scholarship for the next semester if they get a weighted
average
of 1.5 to 1.75 and a full scholarship if the average is better
than
1.5, which means an average of 1.0 to 1.49. What kind of
scholarship will the 2 employees get given their grades for
the previous semester?
Example of Weighted Mean
15
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
Employee A Employee B
Subjects Units Grade Subjects Units Grade
A 1 1.0 A 1 2.0
B 2 1.25 B 2 1.75
C 3 1.5 C 3 1.5
D 4 1.75 D 4 1.25
E 5 2.0 E 5 1.0
Consider the grades of the two employees in the previous
semester:
16
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
We let the units be the weights Wi and the grade is the Xi.
Weighted average of employee A:
67.115
25
54321
)2(5)75.1(4)5.1(3)25.1(2)1(1
wX
Weighted average of employee B:
33.115
20
54321
)0.1(5)25.1(4)5.1(3)75.1(2)2(1
wX
Thus, employee A will get a partial scholarship
while employee B will get a full scholarship.
Solution:
17
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
We can obtain the mean of several data sets given the
means and number of observations of each data set. This
is what we call the combined mean. Suppose that k finite
populations having measurements,
respectively, have means
The combined population mean, c of all the populations is
k21
kk2211
k
1i
i
k
1i
ii
cN...NN
μN...μNμN
N
μN
μ
k21 .,N..,,NN
k21 ,μ...,,μμ
The Combined Population Mean
18
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
If random samples of size , selected from these
k populations, have the means respectively, the combined
sample mean of all the sample data iscX
k
kk
k
i
i
k
i
ii
cnnn
XnXnXn
n
Xn
X
...
...
21
2211
1
1
knnn ,...,, 21
19
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
males = 376.8 females = 309.2
Thus,6028(376.8) 4948(309.2)
3466028 4948
both sexes
Example of the Combined Mean
The Philippines have 6028 male children deaths and
4948 female children deaths for the age group 1-4 in
2002. The average number of deaths for male and
female children is 376.8 and 309.2. What is the
combined population mean for both sexes?
The average number of deaths for children 1-4 years old
for both sexes is 346.
Solution: We let = 6028 and N2 = 4948.
1N
1N
20
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
* It divides an ordered observation into two equal parts so
that half of the observations are below its value and the
other half are above its value.
* It is the positional middle of the array.
Example: If the median annual family income of 500 families
is P185,000, then this implies that half of the 500
families (250 families) have annual family income
lower than P185,000 and the other half (250
families) have annual family income higher than
P185,000.
The Median
21
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
* The first step in finding the median, denoted by Md, is to
arrange the observations in an array.
Case 1: If the number of observations n is odd, the median
is the middle observed value in the array.
Computation of the Median
Case 2: If the number of observations n is even, the median
is the average of the two middle observed values in
the array.
22
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
1. The annual per capita poverty threshold in pesos of the
different regions of the Philippines are as follows: 15,693,
13,066, 12,685, 11,128 13,760, 13,657, 11,995, 11,372,
11,313, 9,656, 9,518, 9,116, 10,503, 10,264, 10,466,
10,896, 12,192.
Solution: We arrange the 17 annual per capita poverty threshold
in pesos of the 17 regions of the Philippines from
lowest to highest.
Examples of the Median
23
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
Since n = 17 is odd, the median is the middle observed value in
the array. That is the median is P11,313.00.
Interpretation: Half of the 17 regions have annual per capita
poverty threshold of P11,313 and the other half
have annual per capita poverty threshold higher
than P11,313 pesos.
Array: 9116, 9518, 9656, 10264, 10466, 10503, 10896, 11128,
11313, 11372, 11995, 12192, 12,685, 13066, 13657,
13760, 15693
Array: 33315, 35945, 42860, 82616, 94079, 117116, 125517,
147513, 151650, 190335, 295334, 410841, 427497,
470299, 1049413, 2799079 n = 16 is even
5.1495812
151650147513 Md
Interpretation: 50% of the 16 regions have number of telephone
lines less than 149581.5 and the upper 50% have
number of telephone lines more than 149581.5.
2. The following are the number of telephone lines of 16 regions
for the year 2004: 2799079, 94079, 190335, 42860,
410841, 1049413, 125157, 427497, 470299, 151652,
35945, 147513, 295334, 82616, 117116, 33315. Find the
median.
25
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
Median
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
26
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
1. The median is a positional measure. This implies that
extreme values affect the median less than the mean.
2. We use the median as a measure of central tendency if we
wish the exact middle value of the distribution, when there
are extreme observed values, and when the frequency
distribution table has open-ended class intervals.
Characteristics of the Median
27
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
* is the observed value that occurs with the greatest
frequency in a data set.
* determine the mode by counting the frequency of
each observed value and finding the observed value
with the highest frequency of occurrence.
* Generally, the mode is a less popular measure of
central tendency as compared to the mean and the
median.
The Mode
28
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
1. Given the data on number of children of 12 currently
married women: 2, 2, 1, 1, 1, 3, 3, 4, 4, 2, 2, 2. Find the
mode.
By inspection, the mode is 2.
Interpretation: The most frequent number of children among
the 12 currently married women is 2.
2. Given the data on number of cases resolved by a 10 lawyers:
5, 4, 1, 1, 3, 3, 2, 1, 3, 0. Find the mode.
The modes are 1 and 3.
Examples of Mode
29
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
3. Given the data on number of cases handled by 14
PAO lawyers : 629, 645, 356, 656, 231, 455, 412,
289, 444, 452, 642, 225, 335, 411. Find the mode.
30
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
Mode
occurs most frequently
may or may not exist
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
31
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
Characteristics of the Mode
1. The mode gives the most typical value of a set of observations.
2. Few low or high values do not easily affect the mode.
3. The mode is sometimes not unique and does not exist.
4. We can have several modes for one data set. If there is one mode, it is unimodal. If there are two modes, we call it bimodal. If there are more than two modes, then we call it multimodal.
5. The value of the mode is always one of the observed values in the data set.
6. We can get the mode for both quantitative and qualitative types of data.
32
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
Given the number of cellular mobile telephone subscribers for
the year 2001, what is the mode?
Telephone Operator Number of Subscribers
EXTELCOM 194,452
GLOBE TELECOM 5,405,415
ISLACOM 181,614
PILTEL 1,483,838
SMART 4,893,844
Example of Mode for Qualitative Data
33
Statistical Research and Training CenterTraining Course on Basic Statistical Analysis Using MS Excel 2007
March 28 – April 1, 2011
* In performing calculations, we only round-off the final
answer and not the transitional values.
* The final answer should increase by one digit of the
original observations.
Example: The mean of the data set 3, 4, and 6 is 4.3333333333…..
Round this figure to the nearest tenth since the original observed values
are whole numbers. Thus, the mean becomes 4.3.
Example: If the original observed values have one decimal place like
4.5, 6.3, 7.7, 8.9, then we round the final answer to two decimal places.
Thus, if we get the mean, the final answer is 6.85.
Round-Off Rule
Thank you.