statistik topic4 measures of central tendency

25

Click here to load reader

Upload: anas-assayuti

Post on 18-Dec-2014

1.447 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Statistik topic4 measures of central tendency

INTRODUCTION

In this topic, you will learn about the position measurements consisting of mean, mode, median, and quantiles. The quantiles will include quartiles, deciles and percentiles. A good understanding of these concepts is important as they will help you to describe the data distribution.

TTooppiicc 44 Measures of CentralTendency

LEARNING OUTCOMES

By the end of this topic, you should be able to:

1. explain the concept of measure of central tendency in the description of data distribution;

2. obtain mean, mode, median and quantiles;

3. state the empirical relationship between mean, mode and median;

4. calculate the inter-quartile range; and

5. estimate the median and quartiles from cumulative distribution.

What is the role of position measurements to describe data distribution?

Page 2: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENDENCY 38

MEASUREMENT OF CENTRAL TENDENCY

These measurements are real numbers located on the horizontal line where the original raw data are plotted. Sometimes, the above real line is called line of data. The numbers are obtained by using an appropriate formula. The numbers such as mean, mode and median are examples of above measurement. The numbers in general can be used to describe the property or characteristics of the data distribution. The figures further can be used to infer the characteristic of the population distribution.

Some roles of the above measurements are:

(a) Describing a Quantitative Feature of Sample Data Let us discuss about mean of distribution data. As this number is calculated

by averaging of all data, it therefore can be considered as the centre of the whole observations. This number will tell us in general that most likely all observations should be scattered around the mean. For example, if the mean of a given data set is 40, then we could expect that majority of observations must be located around the number 40 as their centre position.

The second quantitative feature is that all observations must have the same order as their mean. This means that the observations are possibly of two digits and fairly close to the mean 40. A possible data set is: 30, 32, 36, 38, 40, 41, 43, 42, 48 and 50. However, three digits number or of higher order such as 100, 1000 are less likely to belong to any set of data whose mean is 40.

The third feature is that as a centre, the mean is actually tells us the location or position of data distribution. For data set having mean 60, its distribution will be located to the right of the above data set. Similarly, data set having mean 10 will be located to the left of the previous data set.

(b) Describing the Proportion Feature of the Data Set Supposing the raw data has been arranged in ascending order and plotted on

the line of data. Then,

The number Q1 located on the data line which makes the first 25% (i.e. a proportion of one fourth) of the data comprise of observations having values less than Q1is called the first quartile.

The second quantity Q2 located on the data line which makes about 50% (i.e. a proportion of one half) of the data comprise of observations having values less than Q2 is called the second quartile. The second

4.1

Page 3: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENCENCY 39

quartile is also called the median of the distribution which divides the whole distribution into two equal parts.

The third quantity Q3 located on the data line which makes about 75% (i.e. a proportion of three fourth) of the data comprise of observations having values less than Q3 is called the third quartile.

The above three quartiles are common quantities beside the mean and standard deviation used to describe the distribution of data. It is clearly understood that the three quartiles divide the whole distribution into four equal parts. Figure 4.1 shows the positions of the first two quartiles. Can you locate the third quartile on the same figure? There are many other quantities describing proportions such as deciles and percentiles. We have nine deciles which divide the whole distribution into ten equal parts. As for the percentiles, there are 99 percentiles which divide the whole distribution into 100 equal parts. Deciles and percentiles will be described in Section 4.5.

Figure 4.1: The positions of the first two quartiles of the books distribution on weekly sales given in Topic 2

THE MEAN

The mean or arithmetic mean of a set of n numbers x1,x2,...,xn which given a symbol μ (read miu) is defined as the average of all numbers and given by the following formula:

4.2

Page 4: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENDENCY 40

Formula 4.1

In this module, all calculations will involve all observations therefore we consider the given data as a population. In case of sample mean, which we are not using, the denominator n – 1, instead of n.

Example 4.1

Calculate the mean of set numbers 3, 6, 7, 2, 4, 5, and 8.

Solution

By using the Formula 4.1, the arithmetic mean is given by

0.5735

78542763

Example 4.1(a)

Find the mean of books on weekly sales.

Solution

The mean = μ = 3296 = 65.92 66 books 50

Page 5: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENCENCY 41

4.2.1 The Mean of Repeated Numbers

Supposing we have k different numbers with frequencies of repetition as given in the following table:

Numbers x1 x2 … xk-1 xk

Frequency f1 f2 … fk-1 fk

Then their mean is given by the following Formula 4.1(a).

1 1 2 2 1 1

1 2 1

......

i ik k k k

k k i

f xf x f x f x f xf f f f f

where 1,2,...,i k

Formula 4.1(a)

Here the total frequencies

kk fffff 1321 ... = if = n; the total number of observations.

Example 4.2

Obtain the mean of the following set of data:

2, 3, 4, 7, 4, 5, 2, 6, 5, 7, 7, 6, 5, 8, 3, 5, 4, 9, 5, 7, 3, 5, 8, 4 , 6, 2, 9

Solution

(a) Arrange the data in ascending order,

2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 9

(b) Form the frequency table of each individual number

Number(x) 2 3 4 5 6 7 8 9

Frequency (f) 3 3 4 6 3 4 2 2

Page 6: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENDENCY 42

(c) By using Formula 4.1(a), the mean is

= 3(2) + 3(3) + 4(4) + 6(5) + 3(6) + 4(7) + 2(8) + 2(9) 3 + 3 + 4 + 6 + 3 + 4 + 2 + 2

2.527

141

As we can see that the numbers are scattered around the mean value 5.2. The numbers are also of the same order of their mean value as shown by Figure 4.2.

Figure 4.2: Centre of data

For small size data, scatter plot as shown in Figure 4.2 can be used to clarify the concept of mean which play the role as centre of distribution. However, for large size data, histogram of frequency distribution will be more appropriate.

By looking at the Formulas 4.1 and 4.1(a) the calculation of mean involves all data from the smallest to the largest value in the data set. Thus, either extremely large value data or extremely small value data, even both of them will affect the value of mean.

Example 4.2(a)

Here is the set of annual incomes of four employees of a company: RM4,000; RM5,000; RM5,500; and RM30,000.

(a) Obtain the mean of the annual income.

(b) Give your comment on the values of the income.

(c) Give your comment on the value of the mean obtained. Can the value of mean play the role as centre of the given data set?

Page 7: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENCENCY 43

Solution

(a) The mean is given by,

= RM4000 + RM5000 + RM5500 +RM30000 = RM11,125

4

(b) The RM30,000 is extremely large as compared to the other three income. Apparently this data does not belong the group of the first three income.

(c) The extreme value RM30,000 is shifting the actual position of mean to the right. Since majority of the income is less than RM6,000 therefore the figure RM11,125 is not appropriate to be called the centre of the first three income. It would be better if the fourth employee is removed from the group. This will make the mean of the first three income become RM4,833 which is more appropriate to represent the centre of the majority income i.e. the first three income.

The Mean of Grouped Data

In case we have a large number of data, it should be grouped into K classes. Each class will be represented by its class mid-point. Let the K class mid-points be x1,x2,...,xk and their respective frequencies be f1,f2,...,fk, as given in the following table:

Class mid-points x1 x2 … xk-1 xk

Frequency f1 f2 … fk-1 fk

Then, the mean of the data will be estimated by the mean of the above mid-points and is given by the following Formula 4.1(b).

1 1 2 2 1 1

1 2 1

......

i ik k k k

k k i

f xf x f x f x f xf f f f f

Where 1,2,...,i k

Formula 4.1(b)

Page 8: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENDENCY 44

Just to recall that all observations in each class have been “forgotten” and replaced by the class mid-point. As such, the mean that we will obtain through Formula 4.1(b) is just an approximation to the actual mean of the data.

Example 4.3

Let us refer back to the frequency of books on weekly sales given in Table 2.6 of Topic 2 and obtain the approximate mean number of books on weekly sales. The table is copied to Table 4.1 below together with mid-points of each class.

Table 4.1: The Frequency Distribution of Books on Weekly Sales

Class Class Mid-point (x)

Frequency(f)

f x(f Multiplies x)

34 - 43 38.5 2 77

44 - 53 48.5 5 242.5

54 - 63 58.5 12 702

64 - 73 68.5 18 1233

74 - 83 78.5 10 785

84 - 93 88.5 2 177

94 - 103 98.5 1 98.5

Sum 50 3315

Solution

Finding the mean using Formula 4.1(b).

kk

kkkk

ffffxfxfxfxf

121

112211

......

= 663.6650

3315 books;

(As a comparison, the actual mean is 65.92 = 66 books)

Page 9: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENCENCY 45

The Mean of Combined Sets of Data

Suppose we have two sets of data with the following characteristics:

Set 1: size of data is n1; either known sum of data, H1, or given its mean 1 ;

Set 2: size of data is n2; either known sum of data, H2, or given its mean 2 ;

Then for Set 1, we have a relationship,

1 = 1

1

nH

111 nH ;

and for Set 2, we have a relationship,

2 = 2

2

nH

222 nH .

Now the combined Set 1 and Set 2 will have a total size of 21 nn , and the combined mean is given by:

21

21

nnHH

Formula 4.2(a)

Or,

21

2211 )()(nnnn

Formula 4.2(b)

The Formulas 4.2(a) and 4.2(b) can easily be extended for any number of data sets.

Page 10: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENDENCY 46

Example 4.4

There are five Tutorial Groups of students taking first year statistics. Their respective number of students are 40, 41, 42, 38, and 39. They have taken final examination in a given semester and their respective mean score are 62, 67, 58, 70, and 65. Obtain the overall mean score of all students in the above examination.

Solution

In this problem we are given five groups or classes. For each class, the question provides the total number of students and class mean score. Therefore, with some extension, we can use Formula 4.2(b) and the overall mean is:

54321

5544332211 )()()()()(nnnnn

nnnnn

29.64200

128583938424140

)65(39)70(38)58(42)67(41)62(40

MEDIAN

Median is another measure of central tendency which can be used to describe the distribution of data as we can say that about 50% of the data have values less than the value of median, and another 50% of the data have values larger than the value of median. Since the calculation of median does not involve all observations, it therefore is not affected by extreme values of data.

4.3

Which of the following calculation methods is much easier: Calculation of mean of repeated numbers; calculation of mean of group data; orcalculation of mean of combined data set?

Page 11: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENCENCY 47

Definition of Median

Clearly, for odd number of observations there will always be an observation at the middle position. Whereas for even number of observations, there will be no observation at the middle position. Instead, we will have two observations at the middle; the average of these two middle observations will become the median. Let n be the number of observations, then the median will be at the position (n + 1)/2.

Calculating Median of Ungrouped Data

For ungrouped data, the median is calculated direct from its definition with the following steps:

Step 1 : Arrange the given data in ascending order. Step 2 : Get the position of the median. Then Step 3 : Identify the median, or calculate the average of the two middle

observations, when the number are even.

Example 4.5

Obtain the median of the following sets of data.

(a) 2, 3, 4, 7, 4, 5, 2, 6, 5, 7, 7, 6, 5, 8, 3, 5, 4, 9, 5, 7, 3, 5, 8, 4 ,6 ,2, 9 (Data as given in Example 2.1.2)

(b) 3, 4, 7, 5, 8, 9, 10, 11, 2, 12

Solution

Set (a)

Step 1 : 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 2, 9

Step 2 : In this set there are n = 27 observations. Thus the position of median is 14th.

When all observations are arranged in ascending (or may be descending order), then median is defined as the observation at the middle position (for odd number of observation), or it is the average of two observations at the middle (for even number of observations).

Page 12: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENDENCY 48

Step 3 : The median is number 5 i.e. the fourth number 5.

Set (b)

Step 1 : 2, 3, 4, 5, 7, 8, 9, 10, 11, 12,

Step 2 : In this set there are n = 10 observations. Thus the position of median is at (10 + 1)/2 = 5.5. This position is at the middle between 5th position and 6th position. The observation at the 5th position is number 7, and observation at position 6th is number 8.

Step 3 : Thus the median is the average (7 + 8)/2 = 7.5 which is at the position 5.5.

Calculating Median of a Grouped Data

When the data size is large, it is common to group the data into several classes. The methods of grouping data have been explained in Topic 2. Suppose we have K class mid-points x1,x2,...,xk and their respective frequencies be f1,f2,...,fk. The median is obtained by the following steps:

Step 1 : Get the position of the median: The position of median is at (n + 1)/2. Let us call this number by M0.

Step 2 : Class median. Class median is the class where the median is located. It is important

to identify this class as follows:

(a) Accumulating the frequencies until the SUM exceed M0.

(b) The last frequency that makes the condition in (a) happens will be the frequency of the median class.

(c) Then make the following records:

(i) lower boundary LB of the median class,

(ii) class frequency fm of the median class,

(iii) C, the class interval or class width of the median class, and

(iv) FB the SUM of frequency before condition in (a) happens.

Step 3 : Calculate the median using the following formula:

The median, x~ is

Page 13: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENCENCY 49

m

B

B f

Fn

CLx 21

~

Formula 4.3

For above illustration, we will be using frequency of books on weekly sales in Table 2.6 in Topic 2.

Step1 : The position of median is at (n + 1)/2 = (50 + 1)/2 = 25.5 = M0.

Step 2 : Getting SUM,

(a) SUM = f1 + f2 + f3= 19 (< M0 = 25.5); and

f1 + f2 + f3+ f4 = 19 + 18 = 37 (> M0 = 25.5).

(b) The fourth frequency makes the SUM greater than M0 therefore the fourth class will be the median class.

(c) The median class is 64 – 37, with the following records:

fm = 18, LB = 63.5, C = 10, FB = 19,

Step 3 : The calculation using Formula 4.3:

The median is,

18195.25105.63~x = 67.11 67 books.

MOD

Mod for a set of data is the observation (or the number) which has the largest frequency. Set of data having only one mode is called unimodal data. A set of data may have two modes, and the set is called bimodal data. In the case of more than two modes, the set will be called multimodal data.

4.4

Page 14: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENDENCY 50

4.4.1 Mode of Ungrouped Data

For the set of data with moderate number of observations, mode can be obtained direct from its definition. The data should first be arranged in ascending (or descending) order. Then the mode will be the observation(s) which occurs most frequently.

Example 4.6

Obtain the mode of the following data set:

(i) 2, 3, 4, 7, 4, 5, 2, 6, 5, 7, 7, 6, 5, 8, 3, 5, 4, 9, 5, 7, 3, 5, 8, 4 ,6 (This data is taken from Example 2.1.5)

(ii) 2, 3, 4, 7

(iii) 2, 3, 4, 4, 4,4,4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 10, 12

Solution

(i) 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 2, 9

Since number 5 occurs six times (the highest frequency) therefore the mode is 5.

(ii) 2, 3, 4, 7

There is no mode for this data set.

(iii) 2, 3, 4, 4, 4,4,4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 10, 12

This set is bimodal data, and the modes are 4, and 9.

4.4.2 Mode of Grouped Data

In the case of a large number of data, it is common to group it into several classes. Then the class mode is the one that possesses highest frequency. This means, a class mode is the class whereby the mode of the distribution is located. Then the mode can be obtained by using the following formula:

Page 15: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENCENCY 51

The mode, AB

BB CLx̂ ,

Formula 4.4

where

LB is the lower boundary of the class mode

Bis the different between the frequency of the class mode and the frequency of the class immediately before class mode

Ais the different between the frequency of the class mode and the frequency of the class immediately after class mode

C is the class width of the class mode.

The following example demonstrates how to use the above Formula 4.4

Example 4.7

Find the mode of frequency distribution of books on weekly sales given in Table 2.6 of Topic 2.

Solution

By referring to frequency Table 2.6, we get the following figures:

The class mode is 64 - 73; Its lower boundary is LB = 63.5;Class width is C = 10; and

B = 18 - 12 = 6; A = 18 - 10 = 8.

Then from Formula 4.4, we have the mode as

Page 16: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENDENCY 52

AB

BB CLx̂

= 63.5 + 1086

6

= 63.5 + (60/14) = 67.79 68 books.

4.4.3 The Relationship between Mean, Mode and Median

Sometimes for the unimodal distribution, we may have two types of relationships which are location relationship and empirical relationship.

(a) Location RelationshipsThere are three different cases that can occur as follows:

Symmetrical DistributionThe graph of this type of distribution is as shown in Figure 4.4, Case (a). In this case, the above three measurements have the same location on the horizontal axis. Thus, we have an empirical relationship.

Mean = Mode = Median; i.e. x = x̂ = x~ ;

Left Skewed DistributionThe graph of this type of distribution is as shown in Figure 4.4, Case (b). In this case, the above three measurements have different locations on the horizontal axis with the following empirical relationship.

Mean < Median < Mode; i.e. x < x~ < x̂ ;

Right Skewed Distribution The graph of this type of distribution is as shown in Figure 4.4, Case (c). In this case, the above three measurements have different locations on the horizontal axis with the following empirical relationship:

Page 17: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENCENCY 53

Mean > Median > Mode ; i.e. x > x~ > x̂ ;

(b) Empirical Relationship. For unimodal distribution which is moderately skewed (fairly close to symmetry), we have the following empirical relationship between mean, mode, and median.

(Mean – Mode) 3(Mean – Median), or

( x - x̂ ) 3 ( x - x~ ).

Formula 4.5

Where,

x = min; x̂ = mod; and x~ = median.

This means that if the Formula 4.5 is fulfilled, then we say that the given distribution is moderately skewed. Look at the following cases showing the position of mean, mode, and median.

Case (a): The mean, mode, and median located almost at the same position point

This case happens when the above three quantities are approximately equal values. i.e

Page 18: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENDENCY 54

Case (b): The Mean is smaller than the median, and in turn the median is smaller than the mode

Case (c): The Mean is larger than the median, and in turn the median is larger than the mode

Figure 4.3: The positions of Mean ( x ), Mode ( x̂ ), and the Median ( x~ )

Briefly, discuss with your friends in MyLMS regarding the advantages as well as disadvantages using Mean, Mode, and Median.

Page 19: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENCENCY 55

DECILES AND PERCENTILES

The above three types of quantilies are used to summarise frequency distribution. Each of them divides the whole distribution into a certain number of equal proportions which normally be termed in percentages. For example, deciles as from the root word ‘decimals’ will divide the whole distribution into 10 equal parts.

Figure 4.4 below shows how the nine deciles divide the whole frequency distribution into ten equal parts each of 10% portion.

Figure 4.4: Deciles divide the whole frequency distribution into ten equal parts

(a) DecilesDeciles are from the root word ‘decimal’ which means tenths. This indicates that deciles consist of 9 ordered numbers D1, D2,…, D8 , and D9which divide the whole frequency distribution into 10 (or 9 + 1) equal parts. Again here, each part is termed in percentages. Thus we have the first 10% portion of observations having values less than or equal D1 and about 20% of observations having values less than or equal D2 and so forth. The last 10% of observations have values greater than D9. Then we called D1, D2,…,D8 , and D9 as the First, Second, Third, …, and the ninth deciles. Notice that D5 is actually equal to Q2. See Figure 4.4 above.

(b) Percentiles Percentiles are from the root word ‘percent’ means hundredths. This indicates that percentiles consist of 99 ordered numbers P1, P2,…, P98 and P99 which divide the whole frequency distribution into 100 (or 99 + 1) equal parts. Again here, parts are termed in percentages of 1% each. Thus, we have the first 1% portion of observations having values less than or equal to P1, about 20% portion of observations having values less than or equal to

4.5

Page 20: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENDENCY 56

P20 and so forth; and the last 1% of observations having values greater than P99. Then we called P1, P2,…, P98 and P99 as the First, Second, Third, …, and the ninety-ninth percentiles. Notice that P10 is equal to D1, P25 is equal to Q1 and so on.

To test whether you understand the concepts, let us think of the following problems.

We will not discuss further on Deciles and Percentiles. You can refer to any text books for further detail.

4.5.1 Quartiles of Ungrouped Data

In the case of moderately large data size it is not necessary to group it into several classes. It may follow the steps below:

Step 1 : Identify any quartile and find its position/location.

Let Qr be the required quartile, then its position is given by

)1(4

nr ,

Formula 4.6

For illustration purpose, we focus on the calculation of quartiles as the other two can be calculated in a similar way. Students are advised to refer to Statistik Perihalan dan Kebarangkalian written by Mohd. Kidin Shahran, DBP, 2002 (reprint).

Why do the followings occur?

(a) Q2, D5 and P50 are the same number.

(b) D1 = P10; D2 = P20; D10 = P90; and Q3 = P75.

Page 21: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENCENCY 57

where r = 1 for first quartile, r = 2 for second quartile, and r = 3 for third quartile.

Step 2 : Arrange the data in ascending order.

Step 3 : Obtain the quartile.

Example 4.8

Obtain the quartiles of the following set of data.

12, 13, 12, 14, 14, 24, 24, 25, 16, 17, 18, 19, 10, 13, 16, 20, 20, 22

Solution

Step 1 : The position of quartiles The data size, n = 18, First quartile, r = 1.

Position = 75.4)118(41)1(

4nr = 4 + 0.75,

Q1 is at the position between fourth and fifth, and it is 0.75 above the fourth position.

Second quartile, r = 2.

Position = 50.9)118(42)1(

4nr = 9 + 0.5,

Q2 is at the position between ninth and tenth, and it is 0.5 above the ninth position.

Third quartile, r = 3.

Position = 25.14)118(43)1(

4nr = 14 + 0.25,

Q3 is at the position between fourteenth and fifteenth, and it is 0.25 above the fourteenth position.

Page 22: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENDENCY 58

Step 2 : Arrange the data in ascending order

10, 11, 12, 12, 13, 13, 14, 14, 16, 17, 18, 19, 20, 20, 22, 24, 24, 25

Step 3 : Q1 is at the position between fourth and fifth, and it is 0.75 above fourth.

Number at the fourth position = 12; Number at the fifth position = 13;

Q1 = 12 + (0.75) (13 – 12) = 12 .75.

Q2 is at the position between ninth and tenth, and it is 0.5 above ninth. Number at the ninth position = 16; Number at the tenth position = 17;

Q2 = 16 + (0.5) (17 – 16) = 16.5.

Q3 is at the position between fourteenth and fifteenth, and it is 0.25 above fourteenth.

Number at the fourteenth position = 20; Number at the fifteenth position = 22;

Q3 = 20 + (0.25) (22 – 20) = 20.5.

4.5.2 Quartiles of Grouped Data

When the data size is large, it is common to group the data into several classes. The methods of grouping data have been explained in Unit 1. Suppose we have Kclass mid-points x1,x2,...,xk and their respective frequencies be f1,f2,...,fk. The quartiles are obtained by the following steps: Step 1 : From Formula 4.6, the position of the first quartile is given by:

)1(4

nr , with r = 1.

Let us call this number by Q01.

Step 2 : Class of first quartiles. Class first quartiles are the class where the first quartile is located. It is

important to identify this class as follows:

(a) Accumulating the frequencies until the SUM exceed Q01.

(b) The last frequency that makes the condition in (a) happens will be the frequency of the Q1.

Page 23: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENCENCY 59

(c) Then make the following records:

(i) lower boundary LB of the first quartile class;

(ii) class frequency fQ of the first quartile class;

(iii) C, the class interval or class width of the first quartile class; and

(iv) FB the SUM of frequency before condition in (a) happens.

Step 3 : Calculate the first quartile using the following formula:

The first quartile, Q1 is

Q

B

B f

Fn

CLQ 4)1(

1

Formula 4.6(a)

For illustration, we will be using frequency table on weekly book sales given in Table 2.6, in Topic 2.

Step1 : The position of Q1 is at (n + 1)/4 = (50 + 1)/4 = 12.75 = Q01.

Step 2 : Getting SUM,

(a) SUM = f1 + f2 = 7 (< Q01 = 12.75); and

f1 + f2 + f3 = 19 (> Q01 = 12.75).

(b) The fourth frequency makes the SUM greater than Q01 therefore the third class will be the class of the first quartile.

(c) The Q1 class is 54 – 63, with the following records:

fQ = 12, LB = 53.5, C = 10, FB = 7,

Step 3 : The calculation using Formula 4.6(a):

The Q1 is,

Page 24: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENDENCY 60

12775.12105.531Q = 58.29 58 books.

Repeat the steps for calculating Q3.

The position of Q3 is at 3(n + 1)/4 = 3(50 + 1)/4 = 38.25 = Q03.

Class Q3 is 74 – 83,

fQ = 10, LB = 73.5, C = 10, FB = 37,

103725.38105.733Q = 74.75 75 books.

Thus, we can conclude that about 25% of the weekly sales is less than or equal to 58 books; and that about 75% of the weekly sales is less than or equal to 75 books.

Page 25: Statistik topic4 measures of central tendency

TOPIC 4 MEASURE OF CENTRAL TENCENCY 61

In this topic, we have learnt about mean, mode, median as well as the quartiles, deciles, and percentiles. The mean which is affected by extreme end values plays the role as a centre of distribution. Thus, given the value of mean, we can describe that almost all observations are located surrounding the mean. The mode usually describes the most frequent observations in the data.

We can interpret further that for any two different distributions, their respective means will indicate that they are at two different locations. As such, the mean is sometime being called location parameter. The median will be used if we want to summarise the distribution in two equal parts of 50% each. If we want to break further to summarise in the proportions of 25% each then we should use quartiles.We can also use percentiles to describe the distribution using proportion (in percentages). However, to describe completely about any distribution, we need to describe the shape and the data coverage (the range). The variance or standard deviation which can describe the shape will be discussed in the next topic.

ACTIVITY 4.1

1. Calculate the mean of each of the following data set:

(a) Student’s Mathematics marks for five different examinations are: 85, 90, 70, 65, 75.

(b) Diameter (mm) of ten beakers in science laboratory:

38.5, 40.6, 39.2, 39.5, 40.4, 39.6, 40.3, 39.1, 40.1, 39.8.

(c) Monthly income (in RM) of six factory employees is: 650, 1500, 1600, 1800, 1900, 2200. Give brief comments on your answer.

2. There are five groups of students whose sizes are respectively 14, 15, 16, 18, and 20. Their respective average heights (in meter) are: 1.6, 1.45, 1.50, 1.42, and 1.65. Obtain the average heights of all students.

3. For the following frequency table, obtain the mean, mode, median, first, and third quartiles.

Weights (Kg)

90-94

95-99

100-104

105-109

110-114

115-119

120-124

125-129

Number of Parcels

2 5 12 17 14 6 3 1