research article construction and standardization of

International Academic Journal of Education & Literature

ISSN Print : 2708-5112 | ISSN Online : 2708-5120 Frequency : Bi-Monthly Language : English Origin : Kenya Website : https://www.iarconsortium.org/journal-info/IAJEL

37

Construction and standardization of mathematics achievement test for senior secondary 1 Students

Abstract: This study focused on the construction and standardization of

Mathematics achievement test for senior secondary1 students (SS1). A total of 80

item objective test items were initially generated from SS1 mathematics

curriculum using a table of specification. The items were validated and passed

through a process of item analysis by administering to a sample of 300 SS1

students. Forty items emanated from the item analysis and were further

administered to 1000 SS1 students in Onitsha Educational Zone of Anambra State

within a time interval of 1 hour. A mean score of approximately 22 was calculated

and that became the norm for the 40 item objective test items. The study equally

revealed that there was a statistically significant mildly positive relationship between the difficulty index and the discrimination index.

Keywords: achievement tests, standardized achievement tests, table of

specification, item analysis.

INTRODUCTION

Testing is an integral part of the teaching learning process. Tests are ways of ascertaining the existence of some desired attributes or

characteristics in the testee. Osegbo (2011), defined tests, as “a set of standard questions presented to an individual or group of individuals to answer or respond to (pg 1)”. Tests are classified into different categories

based on the purposes which they serve. For instance, there are performance tests, short answer tests, essay tests, multiple choice tests, true-false tests, matching tests, placement tests, diagnostic test,

progress/achievement test, final progress/achievement test, proficiency test, aptitude test, norm referenced tests, criterion-referenced tests,

summative tests, formative tests, group tests, individual tests, speed tests, power tests, verbal tests, non-verbal tests, culture baised tests, culture-fair tests, standardized tests, non-standardized tests etc (Osegbo, 2011; Davis,

2013; Ramadan, 2014). Roediger, Putnam and Smith (2011), identified the benefits of testing to include: 1. Aiding retention as a result of retrieval.

2. Identifying knowledge gaps. 3. Learning more from the next study episode.

4. Helping students to organize their knowledge. 5. Improving transfer of knowledge to new contexts. 6. Retrieval of untested materials.

7. Improving metacognitive monitoring. 8. Providing feedback to instructors. 9. Improving students study habits.

According to Sheeba (2017), tests provide diagnostic feedback, help in setting standards, evaluate students’ progress,

motivate performance etc. They also help teachers to appraise how successfully they are presenting materials and they provide students with indicators of what topics or skills they have not yet mastered and which they should concentrate on, thus reinforcing learning (Davis, 2013).

Research Article

Article History

Received: 11.05.2021

Revision: 20.05.2021

Accepted: 28.05.2021

Published: 08.06.2021

Author Details .Ezeugo Nneka Chinyere (Ph.D.)

1, Metu

Ifeoma Clementina (Ph.D.)1 and Ikwelle

Anthonia Chika2

Authors Affiliations 1Department of Educational Foundations

Nnamdi Azikiwe University, Awka Anambra

State, Nigeria 2Department of Early Childhood Care and

Education Nwafor Orizu College of Education

Nsugbe Anambra State, Nigeria

Corresponding Author* Ezeugo Nneka Chinyere (Ph.D.)

How to Cite the Article: Ezeugo Nneka Chinyere, Metu Ifeoma

Clementina & Ikwelle Anthonia Chika.

(2021); Construction and Standardization of

Mathematics Achievement Test for Senior Secondary. Int Aca J Edu Lte. 2(3); 37-45 Copyright @ 2021: This is an open-access article distributed under the terms of the Creative Commons Attribution license which permits unrestricted use, distribution, and reproduction in any medium for non commercial use (NonCommercial, or CC-BY-NC) provided the original author and source are credited.

https://www.iarconsortium.org/journal-info/IAJEL

Ezeugo Nneka Chinyere et al., Int Aca J Edu Lte; Vol-2, Iss-3 (May-Jun, 2021): 37-45

38

The present paper focuses on achievement test. Wu (2018), sees achievement tests as “tests designed to measure the knowledge, skills and abilities attained by a

test taker in a field, in a subject area or in a content domain in which the test taker has received training or

instructions” (P.148). It measures present proficiency, mastery and understanding of general and specific areas of knowledge (Diksha, 2020). Kendra (2020) equally

stated that achievement tests measure an individual’s level of skill, accomplishments or knowledge in a specific area. In essence, achievement tests are tied to

specific content domains, subject area or field. It ascertains student’s current level of knowledge and skill

acquisition. It reveals students present level of attainment after training or instruction. As already pointed out about tests, achievement test could be of

different types and can equally be based on different subject matters. However, in order to provide a valid source of comparing an individuals or groups relative

performance in different areas, there is need to go beyond the classroom teacher –made tests. There is the

need to provide norms. Thus the present study is focused on the construction and standardization of mathematics achievement test.

STANDARDIZED ACHIEVEMENT TESTS: MEANING, USES AND TYPES

Standardized tests are “evaluative devices developed to ascertain a sample of behaviour from an individual in

a domain of interest in which the test administration and scoring process is uniform across individuals and both reliability and validity evidence exists such that

inferences regarding the person’s trait can be made from the test score (Morrison & Embretson, 2018:3680). Thorndike (2014), specified that the term

standardized tests implies the availability of normative data (although the term basically points to the adoption

of uniform administration procedures)(p. 339). Basically standardized achievement tests are designed and prepared by or with the assistance of measurement

experts for large number of students. It is accompanied with a manual. They are administered under uniform procedures. They are scored and interpreted in a

standard and consistent manner such that comparism of individual or groups of students become realizable,

(Mehrens & Lehnan, 1991; Okoye, 1996; Ifeakor, 2011; and Great Schools Partnership, 2014).

According to Thorndike, (2014), standardized achievement tests are used for diagnostic and remedial decisions, placement decisions, guidance and

counseling decisions, selection decisions, curricular decisions between alternative programs and public

policy decisions on how well a school is doing. The Great Schools Partnership (2014) equally believed that to bring reformations in schools and improvement of

students achievement, standardized tests serves some of these purposes:-

They make schools and educators accountable for

educational results and students performance. They determine whether students achieved

instructional objectives.

They discover gaps in students learning and academic progress.

They expose achievement gaps among different student groups.

They determine whether educational policies are

working as intended. Mehrens and Lehman (1991), classified

standardized achievement tests into diagnostic, single-subjecct matter and survey batteries. Ifeakor (2011)

outlined four categories which include standardized achievement survey test batteries, standardized achievement survey tests in specific subjects, diagnostic

tests and prognostic tests. On the other hand, Thorndike (2014), mentioned some categories of standardized achievement test and these are:- group standardized

achievement test, individually administered achievement tests, secondary school and college level

achievement tests, diagnostic achievement test and criterion referenced standardized achievement test. Suffice it to say that there are many more

classifications, but the present paper is a single subject matter achievement test which is focused on mathematics.

Steps in Constructing Standardized Achievement

tests Construction of standardized achievement tests

follows certain developmental procedures. Most

standardized achievement tests are developed by professional test publishing organizations, however the laid down procedures could be adopted by anyone

constructing such a test.

The test developer should decide on the type of test needed. Other areas of decision include: the content and skills to be covered; the relative emphasis needed; the

length of the test, the item format, number of subjects needed etc. Although the steps in construction will vary, depending on the nature of the test, a typical sequence

would include: planning the test, writing the items, pre-testing the items, preparing the final form, collecting

reliability and validity evidence and developing norms and criteria for interpretation (Denga, 1987). Okoye, (1996) summarized the steps in test standardization as

follows: Generating items (which entails selecting relevant objectives, developing table of specification and finally generating items), editing items, trial testing

of items, item analysis, administering the test on standardization sample, obtaining test norms, preparing

test manual, printing the test and other relevant materials. Similarly Sharm and Poonam (2017) outlined the following steps in the construction and

standardization of achievement test in English Grammar: Planning the test, preparation of the test, administration of the test, item analysis, standardization


39

of test like reliability, validity. The construction of the

present mathematics achievement test was done through some steps as shall be discussed below.

Method of the Study In standardizing this achievement test, the following

steps where adopted Preliminary Steps:

a) The Purpose of the Test: This test will serve the purpose of measuring students’ achievement in the mathematics content areas specified.

The test developers analyzed the content area

Test Blue Print: The table of specification at six levels of the

cognitive domain was developed to specify the

particular content areas covered and the number of questions for each content area and objectives. Item Writing:

Based on the table of specification, 80 objective items with five options ranging from A to E were

developed.

Validation

Face and Content validity of the test items were determined by giving the test items, table of specification and scheme of work to subject specialists

and test experts for scrutiny and vetting. Their impact was applied in drafting the final test items used for trial

testing.

Trial Testing:

The items were administered to a sample of 300 SS1 students drawn from three Secondary schools in Anambra state and then scored.

Item Analysis:

In carrying out the item analysis, the difficulty index and discrimination index were determined by applying the relevant formula. Also the distracter effectiveness

was also checked by recording the number of people that chose each option for both high achievers and low achievers so as to find out if there are some options that

were not chosen at all. After analysis, items with discrimination index above 0.20 and a difficulty index

between 0.3 and 0.7 are considered to be good items (Robert, 1979, Okoye 1996). However, those items with index of discrimination between 0.20 to 0.29 are

considered to be marginal items usually needing and being subject to improvement, and for this study, they were dropped. Also adopting the SPSS and R statistical

analysis packages, charts like simple bar charts, scatter plot and lolliplots were applied in analyzing the data.

Reliability

The split-half method was adopted in determining

the reliability of the test items. The items were administered to a sample of 32 SS1 students from Anambra east local government area. Scores of even

and odd items were separately done to obtained two sets

of scores for each student. Product moment correlation coefficient of 0.76 was derived for the two half tests. Then Spearman-Brown prophecy formula was applied

to obtain a reliability of 0.86.

Determination of Test Norm: The test norm points to the performance of a typical

individual and it gives the rational for interpreting the

score of any person who takes the test later. Since the test was developed for SS1 students, with no particular age specification, a class norm (mean) was determined

for the scores. To determine the class norm, the items which scaled through item analysis were administered

to a sample of 1000 SS1 students drawn from schools in Onitsha education zone, Anambra State. The answers were scored, summed up and a class mean calculated.

This became the test norm. PURPOSE OF THE STUDY:

Generally, this study aimed at generating 80 items objective test and standardizing it. Specifically it

determined 1. The number of items allotted to the content areas in

SS1 mathematics based on the six levels of the

cognitive domain. 2. The difficulty index of the 80 objective

mathematics items.

3. The discrimination index of the 80 objective mathematics items.

4. The number of items that were good and exceeded the marginal level of acceptance.

5. The class norm for the items exceeding the

marginal level of acceptance. Research Question:

The following research questions were answered. 1. What numbers of items are allotted to the seven

content areas in SS1 objective mathematics test based on the six levels of the cognitive domain.

2. What are the discrimination indexes of each of the

80 objective mathematics items. 3. What are the difficulty indexes of each of the 80

objective mathematics items.

4. How many items of the 80 objective test items had good difficulty index and exceeded the marginal

level for acceptance? 5. What is the class norm for the items exceeding the

marginal level of acceptance?

HYPOTHESIS:

There is no statistically significant relationship

between the difficulty indexes of the test items and their discrimination indexes at 0.05 significant level.


40

RESULT The results are presented according to the research

questions.

Research Question 1

What number of items is allotted to the seven content areas in SS1 objective mathematics based on the six levels of the cognitive domain.

Table 1: Table of specification showing the number of items allotted to the seven content areas of SS1 objective mathematics test.

CO NTENTAREAR Knowledge

40%

Comprehension

20%

Application

20%

Analysis

10%

Synthesis

5%

Evaluation

5%

TO TAL

SET 9% 3 1 1 0 0 0 7

Introduction to formal geometry

17%

6 3 3 1 1 1 14

Statistics 25% 8 4 4 2 1 1 20 Indices & Logarithms 16% 5 3 2 1 1 1 13

Fractions, Decimals &

Percentages 15%

5 2 2 1 1 1 12

Numbers & Numeration 8% 2 1 1 1 0 0 6

Quadratic Equations 10% 3 2 2 1 0 0 8

TO TAL 32 16 16 8 4 4 80

Table 1 reveals that out of the 80 objective test

items, 7 questions were allotted to set, 14 to

introduction to formal geometry, 20 to statistics, 13 to indices and logarithms, 12 to fractions, decimals and

percentages, 6 to numbers and numeration, 10 to quadratic equations.

For the six cognitive levels, the breakdown was 32 knowledge questions, 16 comprehension questions, 16

application questions, 8 analysis questions, 4 synthesis questions and 4 evaluation questions. Each of the

content areas had higher number of knowledge questions, followed by comprehension, application etc.

Research Question 2 What are the discrimination indexes of the 80 objective

mathematics items.

Table 2: Item analysis for 80 items mathematics objective questions for Senior Secondary 1 Students showing the difficulty(P) and discrimination index(D)

S/n 1 2 3 4 5 6 7 8 9 10 11 12

P 0.68 0.27 0.37 0.67 0.63 0.57 0.70 0.60 0.48 0.44 0.4 0.28

D 0.12 0.25 0.10 0.36 0.32 0.51 0.36 0.22 0.27 0.37 0.15 0.26

S/n 13 14 15 16 17 18 19 20 21 22 23 24

P 0.42 0.21 0.59 0.31 0.34 0.30 0.39 0.48 0.46 0.33 0.11 0.59

D 0.40 0.20 0.41 0.33 0.56 0.36 0.26 0.17 0.52 0.23 0.025 0.58

S/n 25 26 27 28 29 30 31 32 33 34 35 36

P 0.018 0.60 0.48 0.43 0.30 0.40 0.33 0.25 0.061 0.086 0.056 0.27

D 0.037 0.58 0.51 0.52 0.35 0.56 0.32 0.33 0.049 0.086 0.11 0.33

S/n 37 38 39 40 41 42 43 44 45 46 47 48

P 0.30 0.41 0.34 0.20 0.15 0.31 0.24 0.38 0.27 0.20 0.60 0.105

D 0.31 0.37 0.51 0.36 0.062 0.53 0.33 0.46 0.22 0.35 0.52 0.21

S/n 49 50 51 52 53 54 55 56 57 58 59 60

P 0.47 0.52 0.33 0.31 0.60 0.40 0.53 0.40 0.15 0.20 0.043 0.36

D 0.64 0.52 0.41 0.53 0.63 0.44 0.67 0.56 0.25 0.17 0.086 0.60

S/n 61 62 63 64 65 66 67 68 69 70 71 72

P 0.20 0.35 0.30 0.30 0.19 0.52 0.28 0.32 0.30 0.25 0.19 0.30

D 0.27 0.32 0.20 0.21 0.30 0.75 0.16 0.47 0.53 0.41 0.22 0.41

S/n 73 74 75 76 77 78 79 80

P 0.27 0.25 0.32 0.25 0.17 0.31 0.23 0.20

D 0.41 0.40 0.49 0.35 0.27 0.47 0.42 0.33


41

Figure 1: A simple bar chart representing the categories of discrimination index of the test items

Items with discrimination index less than 0.19 were graded as having poor discrimination, 0.2 - 0.29 as marginal discrimination and 0.3 and above as good discrimination.

Figure 2: Item Matrix showing the discrimination index of the 80 test items

Table 2 and figure 1 showed that 13 (16.25%) of the test items had poor discrimination, 15 (18.75%) had marginal

discrimination and more than half of the test items 52(65%) had good discrimination.

The Lollipop chart (figure2) went further to identify the individual items discrimination index and arranged them in

order from poor, marginal to good discrimination. The items with poor discrimination are

23,25,33,41,34,59,3,35,1,11,67,20,58 Those with marginal are 14,63,48,64,8,45,71,2,22,57,12,19,9,61,77. The rest had good discrimination.


42

Research Question 3

What are the difficulty indexes of the 80 objective mathematics items?

Figure 3: A simple bar chart representing the categories of difficulty index of test items

Items with difficulty index less than 0.3 were graded as hard items, 0.3 - 0.7 as good items and greater 0.7 as easy items (0% hence not represented on the bar chart).

Table 2 and figure 3 showed that 30(37.50%) of the test items were categorized as hard items because their indexes

fell below 0.3. 50(62.50%) were categorized as good items because their indexes fell between 0.3 and 0.7. No item had

an index above 0.7, therefore, no easy items.

Figure 4: Item matrix showing the individual performance of the 80 test questions.

The lollipop chart in figure 4 presented the individual items difficulty indexes. It arranged them in order of magnitude

from hard items to good items.


43

Research Question 4

How many of the 80 objective test items had good difficulty index and exceeded the marginal level of the discrimination index.

Table3. Item matrix showing a cross tabulation between discrimination index and difficulty index

Discrimination Category Poor Marginal Good

Discrimination discrimination discrimination less than 0.9 0.2 to 0.29 0.3 and above

___________________________Count__________Count___________Count___ Difficulty Hard items 9 9 12 Good items 4 6 40

Easy items 0 0 0

Figure 5: A clustered bar chart representing the cross tabulation between discrimination and difficulty index.

Figure 5, showed that there were 40(50%) test items

that met the condition of being good items and having

discrimination index of 0.3 and above. Table 3 revealed that 9 items were hard with poor

discrimination, 9 were hard with marginal discrimination, 12 were hard with good discrimination.

4 of the items with good difficulty index hard poor discrimination, 6 had marginal discrimination and 40 hard good discrimination. None of the items were

consider as easy items. This information is clearly

presented in figure 6 which is a clustered bar chart representing the cross tabulation between

discrimination index and difficulty index. The chart showed that there were 40 (50%) test items that met the condition of being good items and having

discrimination index of 0.3 and above.

HYPOTHESIS 1: there is no statistically significant linear relationship between the difficulty indexes of the test and their discrimination indexes.


44

Figure 5: Pearson correlation between difficulty index and discrimination index.

A scatter plot, assessing the linear relationship between difficulty index and discrimination index

The chart showed there was a statistically significant

mildly positive linear relationship between the difficulty

indexes of test items and their discrimination indexes (R=0.524, P<0.001). The R

2 which is a measure of the

effect size for the correlation is 0.274 indicating that the difficulty index can effectively predict the discriminative index only in 27.4% of cases.

Research Question 5: what is the class norm for the items exceeding the marginal level of acceptance?

Table 4: Frequency table of the scores of 1000 students

in the MAT

Scores 5-9

10-14

15-19

20-24

25-29

30-34

35-39

Frequency 21 108 233 236 225 149 28

Calculated mean is 22.475 ~ 22.

DISCUSSION AND CONCLUSSION

In this study, 80 items objective questions were developed and passed through several stages for the

purpose of standardization. The item analysis done from the trial testing showed that 50 (62.5%) of the items had

good difficulty level, while 52(65%) possessed good discrimination. However, a cross tabulation showed that only 40(50%) of the items had both good difficulty and

discrimination indexes and were therefore selected for the final instrument.

In determining the level of relationship between the difficulty index and discrimination index, Pearson R of

0.524 with an effect size R2 of 0.274 was calculated.

Cohen gave rules of the thumb for interpreting this effect size which is meant to tell us exactly how large

the relationship is between the two variables examined. According to him, an r of 0.1, represent a small effect size, 0.3 represents a medium effect size and 0.5

represents a large effect size. By implication, the difficulty indexes were to a mild extent good predictors

of the discrimination index, thus reinforcing the positive statistically significant linear relationship between the

difficulty and discrimination index. A class mean of 22 was calculated as shown in

tables above. This implies that when this test is administered to SS1 students within a time interval of 1 hour, any student who scores above 22 is above

average, while those who score below 22 are below average students in mathematics for that class.

REFERENCES

1. Davis, B.G. (2013), Types of tests-excerpt from quizzes, tests, and exams.

http://commons.trincoll.edu 2. Denga, D.I. (1987). Educational measurement,

continuous assessment and psychological testing .

Calabar: Rapid educational publishers 3. Diksha, K. (2020), Achievement test: meaning and

types explained. Yourarticlelibrary.com/education/guidance-techniques/achievement-test-meaning-and-types-

explained/63684 4. Great Schools Partnership. (2014). Standardized

test. edglossary.org/standardized-

test/#:~:text=A%20standardized%20test 5. Ifeakor, A.C. (2011). Standardized . In I.E. Osegbo

& A.C. Ifeakor(eds) ,Psychological measurement and evaluation in education. Onitsha. Fomech printing & Pub.co.ltd.

6. Kendra, C. (2020). How achievement test measure what people have learned. Verywellmind.com/what-is-an-achievement-

test-2794805


45

7. Mehrens, W.A., & Lehmann, I.J. (1991).

Measurement and evaluation in education and psychology. Belmont. Holt, Rinehart and Winston.

8. Morrison, K.M., & Embretson, S.E. (2018).

Standardized tests. In B.B. Frey (ed) The Sage encyclopedia of educational research,

measurement and evaluation p3680. California. 9. Okoye, R.O. (1996), Educational and

Psychological measurement and evaluation. Lagos:

ED-Solid foundation. 10. Osegbo, L. (2011). Meaning of test, measurement

and evaluation. . In I.E. Osegbo & A.C.

Ifeakor(eds) ,Psychological measurement and evaluation in education. Onitsha. Fomech printing

& Pub.co.ltd. 11. Ramadan, M. (2014). 8 kinds of testing and 6 types

of tests. Elttguide.com/-kinds-of-testing-6-types-of-

tests/ 12. Rodiger H.L., Putnam A.L., & Smith, M.A. (2011).

ten benefits of testing and their applications to

educational practice. Psychology of learning and motivation, 55, 1-36.

Psychology.wustl.edu/memory/wp-

content/uploads/2018/04/BC_Roediger_et_al_2011

_PLM.pdf 13. Sheeba, S. (2017). Importance of testing and

evaluation in teaching and Learning. In E. Ahmad;

Importance of testing in teaching and learning. International Journal of Society and Humanities

II(1), 1-9. Researchgate.net/publication/328355159-importance-of-testing-in- teaching-and-

learning. 14. Sharm, H.L., & Poonam. (2017). Construction and

Standardization of an achievement test in English

grammar. International Journal of advanced educational research. 2(5) 230-235.

C:/users?N%20C/Downloads/2-5-108-853.pdf 15. Thorndike, R. (2014), Measurement and

Evaluation in Psychology and education,

8ed.Essex. Pearson education limited. 16. Wu, Y.F. (2018). Achievement tests. In B.B. Frey

(ed) The,Sage encyclopedia of educational

research, measurement and evaluation p3680. California.

research article construction and standardization of

Documents