topic 2 tabular presentation

13
INTRODUCTION You have been introduced to various types of data in Topic 1. In this topic, we will learn how to present data in tabular form to help us to make a further study on the property of data distribution namely Frequency Distribution Table, Relative Frequency Distribution and Cumulative Frequency Distribution. This tabular presentation is suitable for all types of data. The tabular form is much easier to understand and for qualitative variable, one can make a quick comparison between categorical values. Another advantage is that the information lost during the tabular formation can be reduced. T T o o p p i i c c 2 2 Tabular Presentation LEARNING OUTCOMES By the end of this topic, you should be able to: 1. Develop frequency distribution table; 2. Formulate relative frequency distribution table; and 3. Prepare cumulative frequency distribution table.

Upload: husna269

Post on 26-Oct-2014

28 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topic 2 Tabular Presentation

� INTRODUCTION

You have been introduced to various types of data in Topic 1. In this topic, we will learn how to present data in tabular form to help us to make a further study on the property of data distribution namely Frequency Distribution Table, Relative Frequency Distribution and Cumulative Frequency Distribution. This tabular presentation is suitable for all types of data. The tabular form is much easier to understand and for qualitative variable, one can make a quick comparison between categorical values. Another advantage is that the information lost during the tabular formation can be reduced.

TTooppiicc

22

� Tabular Presentation

LEARNING OUTCOMES

By the end of this topic, you should be able to:

1. Develop frequency distribution table;

2. Formulate relative frequency distribution table; and

3. Prepare cumulative frequency distribution table.

Page 2: Topic 2 Tabular Presentation

� TOPIC 2 TABULAR PRESENTATION

12

FREQUENCY DISTRIBUTION TABLE

Table 2.1 below is an example of Frequency Distribution Table of qualitative variable (ethnicity). The first row shows the categorical values of the variable and the second row is the frequency of each categorical value. The second row tells us how a total of 550 students are distributed with respect to the respected categorical value. We can see that 245 students are Malays, 182 students are Chinese and so on.

Table 2.1: Frequency Distribution of students by Ethnicity in School J.

EthnicBackground

(x)

Malay Chinese Indian Others Total

Frequency (f) 245 182 84 39 550 Quantitative data involving large numbers may be divided into several non-overlapping classes or intervals. The frequency of each class will be developed by counting the data falling in each respective class. Table 2.2 shows Frequency Distribution of monthly family income of student’s at School J. The first row shows the group classes of the income, and the second row is the frequency (the number of students) whose monthly family income falls for each respective class of each categorical value. The second row again tells us how the 550 students are distributed into the respective classes. There are 98 of the 550 students whose families have monthly income between RM0 – 1,000. There are 152 families having income in the interval 1,001-2,000 etc.

Table 2.2: Frequency Distribution of Family Income of Students at School J.

MonthlyIncome (RM x)

0 - 1000 1001 - 2000

2001 - 3000

3001 - 4000

4001 - 5000

Total

Frequency (f) 98 152 100 180 20 550

(a) Developing Frequency Distribution Table of Quantitative Data Let us again examine Table 2.2. Each class consists of lower limit and upper

limit separated by a hyphen ‘-’. For example, the second class has a lower limit RM1001, and upper limit RM2000, where as the fifth class has a lower limit RM4001 and upper limit RM5000. By looking at the upper limit of a

2.1

Page 3: Topic 2 Tabular Presentation

TOPIC 2 TABULAR PRESENTATION � 13

class and the lower limit of its following class, it is clear that there are no two adjacent classes overlapping each other.

This property is very important in developing a frequency table, to avoid double counting of any data when obtaining the frequency of each class. Another property is any two adjacent classes are separated by a middle point called class boundary. Thus, each class will also have a lower and an upper boundary. Let us now develop Frequency Distribution Table of books sold weekly by a book store given in Table 2.3 below.

Table 2.3: Number of Books Sold Weekly for 50 Weeks by a Book Store

35 75 65 62 68 55 66 60 62 80 65 70 66 60 72 95 85 66 70 68 65 62 78 80 47 70 68 90 40 72 70 50 70 72 55 55 60 56 48 75 74 62 45 52 55 68 82 80 75 75

(i) The Number of Classes The followings are some guides to determine the number of classes:

� The total number of classes in a distribution table should not be too little or too large or otherwise it will distort the original shape of data distribution. Usually one can choose any number between 5 classes to 15 classes.

� Depending on the size of the data, sometimes the distribution becomes too flat if one chooses more than 15 classes, or become too peak if we choose less than 5 classes.

� However, the following empirical formula (2.1) can be used to determine the approximate number of classes (K) for a given n number of observations.

)log(3.31 nK �� (2.1)

Refer to Table 2.3. Are these data discrete or not? Justify your answer.

ACTIVITY 2.1

Page 4: Topic 2 Tabular Presentation

� TOPIC 2 TABULAR PRESENTATION

14

For the books on weekly sale, we have

K � 1 + 3.3 log (50) = 6.6

� As it is an approximation, one can choose any close integer to the

above value. In this example we would choose integer 6 as the approximate number of classes.

(ii) Class Width and Class Limits

� Class width can differ from one class to another. Usually, the same class width for all classes is recommended when developing frequency distribution table.

� The following empirical formula (2.2) can be used to determine the approximate class width;

)(

KclassofNumberRangeDataWidthClass �

(2.2)

(iii) Data Range

� Range is the difference between the largest and smallest observation values.

� For the books on weekly sales as shown in Table 2.3, the class width will be;

95 35 10 6

largest number smallest numberClassWidth booksK� �

� � �

� Since the data is discrete, it is wise to choose a round figure fairly close to the approximate value (if necessary).

� For the above data, we choose 10 books as the class width or class interval.

Page 5: Topic 2 Tabular Presentation

TOPIC 2 TABULAR PRESENTATION � 15

(iv) Limits of Each Class

The simple rules below are noted when one seek class limits for each class interval:

� Identify the smallest as well as the largest data.

� All data must be enclosed between the lower limit of the first class and the upper limit of the final class.

� The smallest data should be within the first class. Thus the lower limit of the first class can be any number less than or equal the smallest data.

� In the case of the same class width for all classes, the lower limit of a current class is equal to the lower limit of its previous class plus class width. We can proceed this way to build up the entire classes until all data are counted.

� Tallying process is normally used to count data that falls in each class, this count become the frequency of each class.

For the data books on weekly sales, let 34 be the lower limit of the first class, then the lower limit of the second class is 44 (i.e. 34 + 10, the lower limit of the first class is incremented by class width to obtain the lower limit of the second class); and the lower limit of the third class will be 54 and so on until we get the lower limit of the final class as 94 (i.e. 84+10).

On the other hand, the upper limit of the first class is 43 (just 1 unit less than lower limit of the second class). We can build the upper limits of all classes in the same manner. Eventually, we will have the classes as: 34-43, 44-53, 54-63, 64-73, 74-83, 84-93, and 94-103. We notice that the actual number of classes developed is 7 which is greater than the round up integer of the original calculated value K.

(v) Frequency of Each Class

The following process is recommended to determine the frequency of each class: � The tally counting method is the easiest way to determine the

frequency of each class from the given set of data. � Begins with the first number in the data set, search which class the

number will fall, then strike “1 vertical bar or stroke” for that

Page 6: Topic 2 Tabular Presentation

� TOPIC 2 TABULAR PRESENTATION

16

particular class. If the second number would fall into the same class, then we have the second stroke for that class, and so on.

� Once we have four strokes for a class, the fifth stroke will be used as a back-stroke to tie up the immediate first four strokes and make one ‘bundle’. So one ‘bundle’ will comprise of 5 strokes altogether.

� The process of searching class for each data is continued until we cover all data.

� As one stroke to represent one data, therefore a bundle will represent 5 data fall into the class.

� By counting the bundles will make the counting process much easier. There may be several ‘bundles’ and or strokes for a class.

� The total number of strokes will be the frequency for that class. � The total frequency for all classes will then be equal to the total

number of data in the sample. � The counting process for books on weekly sales is given in Table

2.4 below:

Table 2.4: Frequency Distribution of Books on Weekly Sales

Class Counting Tally Frequency (f) 34 - 43 ll 2 44 - 53 llll 5 54 - 63 llll llll ll 12 64 - 73 llll llll llll lll 18 74 - 83 llll llll 10 84 - 93 ll 2

94 - 103 l 1

Sum � f = 50

(vi) Class Boundaries and Class Mid-points

� Any two adjacent classes are separated by a middle point called class boundary. It is a mid-point between the lower limit of a class and the upper limit of its previous class.

� This separation will ensure the non-overlapping between any two adjacent classes.

Page 7: Topic 2 Tabular Presentation

TOPIC 2 TABULAR PRESENTATION � 17

� Thus, each class will have a lower boundary and an upper boundary.

� The lower boundary of a given class is actually the upper boundary of its previous class as demonstrated by Figure 2.1.

� Class boundaries can be obtained as follows:

2

a ��

���

���

���

�classthat

itlimlowerclasspreviousitlimupper

classofboundaryLower

2

a ��

���

���

���

�classnextof

itlimlowerclassthatof

itlimupper

classofboundaryUpper

� Class mid-point is located at the middle of each class and is

obtained by:

2

intp- ��

���

���

���

�classthatof

boundaryupperclasstheof

boundarylower

omidClass

� Class mid-point will become very important number as it represents all data that fall in that particular class irrespective of their actual raw values.

� By virtue of its roles, as for the data books on daily loan, we are

actually reducing the data sizes to the number of K class mid-points. These K class mid-points then will be used in further calculation of descriptive statistics such as mean, mode, median etc. of the data distribution.

Figure 2.1: The property of any class

Page 8: Topic 2 Tabular Presentation

� TOPIC 2 TABULAR PRESENTATION

18

Table 2.5 shows the properties of classes of the frequency table. Table 2.5: The Lower Class-boundary, Class Mid-point and Upper Class-boundary of the

Frequency Table of Books

Class Lower Boundary

Class Mid-point (x)

UpperBoundary

Frequency (f)

34 - 43 33.5 38.5 43.5 2 44 - 53 43.5 48.5 53.5 5 54 - 63 53.5 58.5 63.5 12 64 - 73 63.5 68.5 73.5 18 74 - 83 73.5 78.5 83.5 10 84 - 93 83.5 88.5 93.5 2

94 - 103 93.5 98.5 103.5 1

� f = 50

(b) The Actual Frequency Table The actual frequency table is the one without the column of tally counting,

as follows:

Table 2.6: Frequency Distribution Table on Weekly Book Sales

Class 34 - 43 44 - 53 54 - 63 64 - 73 74 - 83 84 - 93 94 - 103

Frequency (f) 2 5 12 18 10 2 1

You should attempt the following exercises to test your understanding on the discussed concepts.

Data set comprises of non-repeating individual numbers or observation that can be grouped into several classes before developing frequency table. Do you agree with this idea? Give your opinion.

ACTIVITY 2.2

Page 9: Topic 2 Tabular Presentation

TOPIC 2 TABULAR PRESENTATION � 19

RELATIVE FREQUENCY DISTRIBUTION

Relative frequency of a class is the ratio of its frequency to the total frequency. Each relative frequency has value between 0 and 1, and the total of all relative frequencies would then be equal to 1. Some times relative frequency can be expressed in percentage by multiplying 100% to each relative frequency. Thus, we will have the total of 100%. By referring to Table 2.6, the Relative Frequency distribution for the books on daily loan can be developed. This is given in Table 2.7 below. As per our observation from Table 2.7, one can easily tell the proportion or percentage of all data that fall in a particular class. For example, there is about 0.04 or 4% of the data are between 34 and 43 books on weekly sales. By doing some additions, we can also tell that about 0.80 or 80% (i.e. 24%+36%+20%) of the data are between 54 and 83 books, and it is only 6% above 83 books on weekly sales.

2.2

EXERCISE 2.1 1. The following are the marks of the Statistics subject obtained by 40

students in a final examination. Develop a frequency table, use 4 as lower limit of the first class.

60 20 10 25 5 35 30 65 15 40 45 5 30 55 60 45 50 8 10 40 20 30 34 4 25 56 48 9 16 44 70 24 7 9 36 30 30 40 65 50

(a) State the lower and upper limits and its frequency of the second class.

(b) Obtain the lower and upper boundaries, and class mid-point of the fifth class.

Page 10: Topic 2 Tabular Presentation

� TOPIC 2 TABULAR PRESENTATION

20

Table 2.7: Relative Frequency Distribution for the Books on Weekly Sales

Class 34 - 43 44 - 53 54 - 63 64 - 73 74 - 83 84 - 93 94 - 103 Sum

Frequency (f)

2 5 12 18 10 2 1 50

Relative Frequency

0.04 0.1 0.24 0.36 0.20 0.04 0.02 1.00

Relative Frequency

(%)

4 10 24 36 20 4 2 100

CUMULATIVE FREQUENCY DISTRIBUTION

The total frequency of all values less than the upper class boundary of a given class is called a cumulative frequency up to and including the upper limit of that class. For example, the cumulative frequency up to and including the class 54-63 in Table 2.7 is 2+5+12 = 19, signifying that by 19 weeks, 63 books were sold having books on sales less than 63.5 books. A table presenting such cumulative frequencies is called a cumulative frequency distribution table, or cumulative frequency table, or briefly a cumulative distribution. There are two types of cumulative distributions:

(a) Cumulative distribution “Less-than or Equal”, using upper boundaries as partition;

(b) Cumulative distribution “More-than”, using lower boundaries as partition. In this course we will only concentrate on the first type. Table 2.8 presents the cumulative distribution of the type “Less-than or Equal” for the books on weekly sales. For this type, we need to add a class with ‘zero frequency’ prior to the first class of Table 2.6, and use its upper boundary as 33.5 books.

2.3

Page 11: Topic 2 Tabular Presentation

TOPIC 2 TABULAR PRESENTATION � 21

Table 2.8: Developing Cumulative Distribution Type “Less-than or Equal” for the Books on Weekly Sales

Class Frequency (f)

UpperBoundary

CumulatingProcess

CumulativeFrequency

24 – 33 0 � 33.5 0 0

34 - 43 2 � 43.5 0 + 2 2

44 - 53 5 � 53.5 2 + 5 7

54 - 63 12 � 63.5 7 + 12 19

64 - 73 18 � 73.5 19 + 18 37

74 - 83 10 � 83.5 37 + 10 47

84 - 93 2 � 93.5 47 + 2 49

94 - 103

1 � 103.5 49 + 1 50

Sum � f = 50

The actual cumulative distribution table is given in Table 2.9 below. The column for cumulative frequency in percentage (%) is optional.

Table 2.9: The “Less-than or Equal” Cumulative Distribution for the Books on Weekly Sales

Upper Boundary Cumulative Frequency Cumulative Frequency (%)

� 33.5 0 0

� 43.5 2 4

� 53.5 7 14

� 63.5 19 38

� 73.5 37 74

� 83.5 47 94

� 93.5 49 98

� 103.5 50 100 Do attempt the following exercises to test your understanding.

Page 12: Topic 2 Tabular Presentation

� TOPIC 2 TABULAR PRESENTATION

22

EXERCISE 2.2

1. The following questions are based on the given frequency table:

Marks 10 - 19 20 - 29 30 - 39 40 - 49 50 - 59

Number of students (f) 10 25 35 20 10

(a) Give the number of students that acquired not more than 29 marks. (b) Give the number of students that acquired 30 or more marks.

2. Refer to the frequency table given in Question 1,

(a) Obtain the class mid-points of all classes, (b) Obtain the table of Relative Frequencies. (c) Obtain the Cumulative frequency “less than or equal”.

3. There are 1,000 students staying in university campus. All respondents

of a survey research regarding the degree of comfort of a residential area. The following Likert Scale is given to them to gauge their perception:

1 2 3 4 5

Very comfortable

Comfortable Fairly comfortable

Un-comfortable Very Un-comfortable

The research findings shows that: 120 students choose category ‘1’,

180 students choose category ‘2’, 360 students choose category ‘3’, 240 students choose category ‘4’ and 100 students choose category ‘5’. Display the research findings in the form of frequency table distribution, as well as their relative frequency distribution in terms of proportion and percentages.

4. A teacher wants to know the effectiveness of the new teaching method

for mathematics at a primary school. The method has been delivered to a class of 20 pupils. A test is given to the pupils at the end of semester. The test marks are given below:

77 91 62 54 72 66 84 38 76 70 84 59 82 78 74 96 44 76 85 66

Develop a frequency distribution table. Let 35 marks be the lower limit of the first class.

Page 13: Topic 2 Tabular Presentation

TOPIC 2 TABULAR PRESENTATION � 23

� The frequency distribution table, relative frequency distribution and

cumulative distribution are tabular presentation of the original raw data in a form of a more meaningful interpretation.

� The tabular presentation is also very useful when it is needed to have a graphical presentation later on.