research article construction and standardization of
TRANSCRIPT
International Academic Journal of Education & Literature
ISSN Print : 2708-5112 | ISSN Online : 2708-5120 Frequency : Bi-Monthly Language : English Origin : Kenya Website : https://www.iarconsortium.org/journal-info/IAJEL
37
Construction and standardization of mathematics achievement test for senior secondary 1 Students
Abstract: This study focused on the construction and standardization of
Mathematics achievement test for senior secondary1 students (SS1). A total of 80
item objective test items were initially generated from SS1 mathematics
curriculum using a table of specification. The items were validated and passed
through a process of item analysis by administering to a sample of 300 SS1
students. Forty items emanated from the item analysis and were further
administered to 1000 SS1 students in Onitsha Educational Zone of Anambra State
within a time interval of 1 hour. A mean score of approximately 22 was calculated
and that became the norm for the 40 item objective test items. The study equally
revealed that there was a statistically significant mildly positive relationship between the difficulty index and the discrimination index.
Keywords: achievement tests, standardized achievement tests, table of
specification, item analysis.
INTRODUCTION
Testing is an integral part of the teaching learning process. Tests are ways of ascertaining the existence of some desired attributes or
characteristics in the testee. Osegbo (2011), defined tests, as “a set of standard questions presented to an individual or group of individuals to answer or respond to (pg 1)”. Tests are classified into different categories
based on the purposes which they serve. For instance, there are performance tests, short answer tests, essay tests, multiple choice tests, true-false tests, matching tests, placement tests, diagnostic test,
progress/achievement test, final progress/achievement test, proficiency test, aptitude test, norm referenced tests, criterion-referenced tests,
summative tests, formative tests, group tests, individual tests, speed tests, power tests, verbal tests, non-verbal tests, culture baised tests, culture-fair tests, standardized tests, non-standardized tests etc (Osegbo, 2011; Davis,
2013; Ramadan, 2014). Roediger, Putnam and Smith (2011), identified the benefits of testing to include: 1. Aiding retention as a result of retrieval.
2. Identifying knowledge gaps. 3. Learning more from the next study episode.
4. Helping students to organize their knowledge. 5. Improving transfer of knowledge to new contexts. 6. Retrieval of untested materials.
7. Improving metacognitive monitoring. 8. Providing feedback to instructors. 9. Improving students study habits.
According to Sheeba (2017), tests provide diagnostic feedback, help in setting standards, evaluate students’ progress,
motivate performance etc. They also help teachers to appraise how successfully they are presenting materials and they provide students with indicators of what topics or skills they have not yet mastered and which they should concentrate on, thus reinforcing learning (Davis, 2013).
Research Article
Article History
Received: 11.05.2021
Revision: 20.05.2021
Accepted: 28.05.2021
Published: 08.06.2021
Author Details .Ezeugo Nneka Chinyere (Ph.D.)
1, Metu
Ifeoma Clementina (Ph.D.)1 and Ikwelle
Anthonia Chika2
Authors Affiliations 1Department of Educational Foundations
Nnamdi Azikiwe University, Awka Anambra
State, Nigeria 2Department of Early Childhood Care and
Education Nwafor Orizu College of Education
Nsugbe Anambra State, Nigeria
Corresponding Author* Ezeugo Nneka Chinyere (Ph.D.)
How to Cite the Article: Ezeugo Nneka Chinyere, Metu Ifeoma
Clementina & Ikwelle Anthonia Chika.
(2021); Construction and Standardization of
Mathematics Achievement Test for Senior Secondary. Int Aca J Edu Lte. 2(3); 37-45 Copyright @ 2021: This is an open-access article distributed under the terms of the Creative Commons Attribution license which permits unrestricted use, distribution, and reproduction in any medium for non commercial use (NonCommercial, or CC-BY-NC) provided the original author and source are credited.
Ezeugo Nneka Chinyere et al., Int Aca J Edu Lte; Vol-2, Iss-3 (May-Jun, 2021): 37-45
38
The present paper focuses on achievement test. Wu (2018), sees achievement tests as “tests designed to measure the knowledge, skills and abilities attained by a
test taker in a field, in a subject area or in a content domain in which the test taker has received training or
instructions” (P.148). It measures present proficiency, mastery and understanding of general and specific areas of knowledge (Diksha, 2020). Kendra (2020) equally
stated that achievement tests measure an individual’s level of skill, accomplishments or knowledge in a specific area. In essence, achievement tests are tied to
specific content domains, subject area or field. It ascertains student’s current level of knowledge and skill
acquisition. It reveals students present level of attainment after training or instruction. As already pointed out about tests, achievement test could be of
different types and can equally be based on different subject matters. However, in order to provide a valid source of comparing an individuals or groups relative
performance in different areas, there is need to go beyond the classroom teacher –made tests. There is the
need to provide norms. Thus the present study is focused on the construction and standardization of mathematics achievement test.
STANDARDIZED ACHIEVEMENT TESTS: MEANING, USES AND TYPES
Standardized tests are “evaluative devices developed to ascertain a sample of behaviour from an individual in
a domain of interest in which the test administration and scoring process is uniform across individuals and both reliability and validity evidence exists such that
inferences regarding the person’s trait can be made from the test score (Morrison & Embretson, 2018:3680). Thorndike (2014), specified that the term
standardized tests implies the availability of normative data (although the term basically points to the adoption
of uniform administration procedures)(p. 339). Basically standardized achievement tests are designed and prepared by or with the assistance of measurement
experts for large number of students. It is accompanied with a manual. They are administered under uniform procedures. They are scored and interpreted in a
standard and consistent manner such that comparism of individual or groups of students become realizable,
(Mehrens & Lehnan, 1991; Okoye, 1996; Ifeakor, 2011; and Great Schools Partnership, 2014).
According to Thorndike, (2014), standardized achievement tests are used for diagnostic and remedial decisions, placement decisions, guidance and
counseling decisions, selection decisions, curricular decisions between alternative programs and public
policy decisions on how well a school is doing. The Great Schools Partnership (2014) equally believed that to bring reformations in schools and improvement of
students achievement, standardized tests serves some of these purposes:-
They make schools and educators accountable for
educational results and students performance. They determine whether students achieved
instructional objectives.
They discover gaps in students learning and academic progress.
They expose achievement gaps among different student groups.
They determine whether educational policies are
working as intended. Mehrens and Lehman (1991), classified
standardized achievement tests into diagnostic, single-subjecct matter and survey batteries. Ifeakor (2011)
outlined four categories which include standardized achievement survey test batteries, standardized achievement survey tests in specific subjects, diagnostic
tests and prognostic tests. On the other hand, Thorndike (2014), mentioned some categories of standardized achievement test and these are:- group standardized
achievement test, individually administered achievement tests, secondary school and college level
achievement tests, diagnostic achievement test and criterion referenced standardized achievement test. Suffice it to say that there are many more
classifications, but the present paper is a single subject matter achievement test which is focused on mathematics.
Steps in Constructing Standardized Achievement
tests Construction of standardized achievement tests
follows certain developmental procedures. Most
standardized achievement tests are developed by professional test publishing organizations, however the laid down procedures could be adopted by anyone
constructing such a test.
The test developer should decide on the type of test needed. Other areas of decision include: the content and skills to be covered; the relative emphasis needed; the
length of the test, the item format, number of subjects needed etc. Although the steps in construction will vary, depending on the nature of the test, a typical sequence
would include: planning the test, writing the items, pre-testing the items, preparing the final form, collecting
reliability and validity evidence and developing norms and criteria for interpretation (Denga, 1987). Okoye, (1996) summarized the steps in test standardization as
follows: Generating items (which entails selecting relevant objectives, developing table of specification and finally generating items), editing items, trial testing
of items, item analysis, administering the test on standardization sample, obtaining test norms, preparing
test manual, printing the test and other relevant materials. Similarly Sharm and Poonam (2017) outlined the following steps in the construction and
standardization of achievement test in English Grammar: Planning the test, preparation of the test, administration of the test, item analysis, standardization
Ezeugo Nneka Chinyere et al., Int Aca J Edu Lte; Vol-2, Iss-3 (May-Jun, 2021): 37-45
39
of test like reliability, validity. The construction of the
present mathematics achievement test was done through some steps as shall be discussed below.
Method of the Study In standardizing this achievement test, the following
steps where adopted Preliminary Steps:
a) The Purpose of the Test: This test will serve the purpose of measuring students’ achievement in the mathematics content areas specified.
The test developers analyzed the content area
Test Blue Print: The table of specification at six levels of the
cognitive domain was developed to specify the
particular content areas covered and the number of questions for each content area and objectives. Item Writing:
Based on the table of specification, 80 objective items with five options ranging from A to E were
developed.
Validation
Face and Content validity of the test items were determined by giving the test items, table of specification and scheme of work to subject specialists
and test experts for scrutiny and vetting. Their impact was applied in drafting the final test items used for trial
testing.
Trial Testing:
The items were administered to a sample of 300 SS1 students drawn from three Secondary schools in Anambra state and then scored.
Item Analysis:
In carrying out the item analysis, the difficulty index and discrimination index were determined by applying the relevant formula. Also the distracter effectiveness
was also checked by recording the number of people that chose each option for both high achievers and low achievers so as to find out if there are some options that
were not chosen at all. After analysis, items with discrimination index above 0.20 and a difficulty index
between 0.3 and 0.7 are considered to be good items (Robert, 1979, Okoye 1996). However, those items with index of discrimination between 0.20 to 0.29 are
considered to be marginal items usually needing and being subject to improvement, and for this study, they were dropped. Also adopting the SPSS and R statistical
analysis packages, charts like simple bar charts, scatter plot and lolliplots were applied in analyzing the data.
Reliability
The split-half method was adopted in determining
the reliability of the test items. The items were administered to a sample of 32 SS1 students from Anambra east local government area. Scores of even
and odd items were separately done to obtained two sets
of scores for each student. Product moment correlation coefficient of 0.76 was derived for the two half tests. Then Spearman-Brown prophecy formula was applied
to obtain a reliability of 0.86.
Determination of Test Norm: The test norm points to the performance of a typical
individual and it gives the rational for interpreting the
score of any person who takes the test later. Since the test was developed for SS1 students, with no particular age specification, a class norm (mean) was determined
for the scores. To determine the class norm, the items which scaled through item analysis were administered
to a sample of 1000 SS1 students drawn from schools in Onitsha education zone, Anambra State. The answers were scored, summed up and a class mean calculated.
This became the test norm. PURPOSE OF THE STUDY:
Generally, this study aimed at generating 80 items objective test and standardizing it. Specifically it
determined 1. The number of items allotted to the content areas in
SS1 mathematics based on the six levels of the
cognitive domain. 2. The difficulty index of the 80 objective
mathematics items.
3. The discrimination index of the 80 objective mathematics items.
4. The number of items that were good and exceeded the marginal level of acceptance.
5. The class norm for the items exceeding the
marginal level of acceptance. Research Question:
The following research questions were answered. 1. What numbers of items are allotted to the seven
content areas in SS1 objective mathematics test based on the six levels of the cognitive domain.
2. What are the discrimination indexes of each of the
80 objective mathematics items. 3. What are the difficulty indexes of each of the 80
objective mathematics items.
4. How many items of the 80 objective test items had good difficulty index and exceeded the marginal
level for acceptance? 5. What is the class norm for the items exceeding the
marginal level of acceptance?
HYPOTHESIS:
There is no statistically significant relationship
between the difficulty indexes of the test items and their discrimination indexes at 0.05 significant level.
Ezeugo Nneka Chinyere et al., Int Aca J Edu Lte; Vol-2, Iss-3 (May-Jun, 2021): 37-45
40
RESULT The results are presented according to the research
questions.
Research Question 1
What number of items is allotted to the seven content areas in SS1 objective mathematics based on the six levels of the cognitive domain.
Table 1: Table of specification showing the number of items allotted to the seven content areas of SS1 objective mathematics test.
CO NTENTAREAR Knowledge
40%
Comprehension
20%
Application
20%
Analysis
10%
Synthesis
5%
Evaluation
5%
TO TAL
SET 9% 3 1 1 0 0 0 7
Introduction to formal geometry
17%
6 3 3 1 1 1 14
Statistics 25% 8 4 4 2 1 1 20 Indices & Logarithms 16% 5 3 2 1 1 1 13
Fractions, Decimals &
Percentages 15%
5 2 2 1 1 1 12
Numbers & Numeration 8% 2 1 1 1 0 0 6
Quadratic Equations 10% 3 2 2 1 0 0 8
TO TAL 32 16 16 8 4 4 80
Table 1 reveals that out of the 80 objective test
items, 7 questions were allotted to set, 14 to
introduction to formal geometry, 20 to statistics, 13 to indices and logarithms, 12 to fractions, decimals and
percentages, 6 to numbers and numeration, 10 to quadratic equations.
For the six cognitive levels, the breakdown was 32 knowledge questions, 16 comprehension questions, 16
application questions, 8 analysis questions, 4 synthesis questions and 4 evaluation questions. Each of the
content areas had higher number of knowledge questions, followed by comprehension, application etc.
Research Question 2 What are the discrimination indexes of the 80 objective
mathematics items.
Table 2: Item analysis for 80 items mathematics objective questions for Senior Secondary 1 Students showing the difficulty(P) and discrimination index(D)
S/n 1 2 3 4 5 6 7 8 9 10 11 12
P 0.68 0.27 0.37 0.67 0.63 0.57 0.70 0.60 0.48 0.44 0.4 0.28
D 0.12 0.25 0.10 0.36 0.32 0.51 0.36 0.22 0.27 0.37 0.15 0.26
S/n 13 14 15 16 17 18 19 20 21 22 23 24
P 0.42 0.21 0.59 0.31 0.34 0.30 0.39 0.48 0.46 0.33 0.11 0.59
D 0.40 0.20 0.41 0.33 0.56 0.36 0.26 0.17 0.52 0.23 0.025 0.58
S/n 25 26 27 28 29 30 31 32 33 34 35 36
P 0.018 0.60 0.48 0.43 0.30 0.40 0.33 0.25 0.061 0.086 0.056 0.27
D 0.037 0.58 0.51 0.52 0.35 0.56 0.32 0.33 0.049 0.086 0.11 0.33
S/n 37 38 39 40 41 42 43 44 45 46 47 48
P 0.30 0.41 0.34 0.20 0.15 0.31 0.24 0.38 0.27 0.20 0.60 0.105
D 0.31 0.37 0.51 0.36 0.062 0.53 0.33 0.46 0.22 0.35 0.52 0.21
S/n 49 50 51 52 53 54 55 56 57 58 59 60
P 0.47 0.52 0.33 0.31 0.60 0.40 0.53 0.40 0.15 0.20 0.043 0.36
D 0.64 0.52 0.41 0.53 0.63 0.44 0.67 0.56 0.25 0.17 0.086 0.60
S/n 61 62 63 64 65 66 67 68 69 70 71 72
P 0.20 0.35 0.30 0.30 0.19 0.52 0.28 0.32 0.30 0.25 0.19 0.30
D 0.27 0.32 0.20 0.21 0.30 0.75 0.16 0.47 0.53 0.41 0.22 0.41
S/n 73 74 75 76 77 78 79 80
P 0.27 0.25 0.32 0.25 0.17 0.31 0.23 0.20
D 0.41 0.40 0.49 0.35 0.27 0.47 0.42 0.33
Ezeugo Nneka Chinyere et al., Int Aca J Edu Lte; Vol-2, Iss-3 (May-Jun, 2021): 37-45
41
Figure 1: A simple bar chart representing the categories of discrimination index of the test items
Items with discrimination index less than 0.19 were graded as having poor discrimination, 0.2 - 0.29 as marginal discrimination and 0.3 and above as good discrimination.
Figure 2: Item Matrix showing the discrimination index of the 80 test items
Table 2 and figure 1 showed that 13 (16.25%) of the test items had poor discrimination, 15 (18.75%) had marginal
discrimination and more than half of the test items 52(65%) had good discrimination.
The Lollipop chart (figure2) went further to identify the individual items discrimination index and arranged them in
order from poor, marginal to good discrimination. The items with poor discrimination are
23,25,33,41,34,59,3,35,1,11,67,20,58 Those with marginal are 14,63,48,64,8,45,71,2,22,57,12,19,9,61,77. The rest had good discrimination.
Ezeugo Nneka Chinyere et al., Int Aca J Edu Lte; Vol-2, Iss-3 (May-Jun, 2021): 37-45
42
Research Question 3
What are the difficulty indexes of the 80 objective mathematics items?
Figure 3: A simple bar chart representing the categories of difficulty index of test items
Items with difficulty index less than 0.3 were graded as hard items, 0.3 - 0.7 as good items and greater 0.7 as easy items (0% hence not represented on the bar chart).
Table 2 and figure 3 showed that 30(37.50%) of the test items were categorized as hard items because their indexes
fell below 0.3. 50(62.50%) were categorized as good items because their indexes fell between 0.3 and 0.7. No item had
an index above 0.7, therefore, no easy items.
Figure 4: Item matrix showing the individual performance of the 80 test questions.
The lollipop chart in figure 4 presented the individual items difficulty indexes. It arranged them in order of magnitude
from hard items to good items.
Ezeugo Nneka Chinyere et al., Int Aca J Edu Lte; Vol-2, Iss-3 (May-Jun, 2021): 37-45
43
Research Question 4
How many of the 80 objective test items had good difficulty index and exceeded the marginal level of the discrimination index.
Table3. Item matrix showing a cross tabulation between discrimination index and difficulty index
Discrimination Category Poor Marginal Good
Discrimination discrimination discrimination less than 0.9 0.2 to 0.29 0.3 and above
___________________________Count__________Count___________Count___ Difficulty Hard items 9 9 12 Good items 4 6 40
Easy items 0 0 0
Figure 5: A clustered bar chart representing the cross tabulation between discrimination and difficulty index.
Figure 5, showed that there were 40(50%) test items
that met the condition of being good items and having
discrimination index of 0.3 and above. Table 3 revealed that 9 items were hard with poor
discrimination, 9 were hard with marginal discrimination, 12 were hard with good discrimination.
4 of the items with good difficulty index hard poor discrimination, 6 had marginal discrimination and 40 hard good discrimination. None of the items were
consider as easy items. This information is clearly
presented in figure 6 which is a clustered bar chart representing the cross tabulation between
discrimination index and difficulty index. The chart showed that there were 40 (50%) test items that met the condition of being good items and having
discrimination index of 0.3 and above.
HYPOTHESIS 1: there is no statistically significant linear relationship between the difficulty indexes of the test and their discrimination indexes.
Ezeugo Nneka Chinyere et al., Int Aca J Edu Lte; Vol-2, Iss-3 (May-Jun, 2021): 37-45
44
Figure 5: Pearson correlation between difficulty index and discrimination index.
A scatter plot, assessing the linear relationship between difficulty index and discrimination index
The chart showed there was a statistically significant
mildly positive linear relationship between the difficulty
indexes of test items and their discrimination indexes (R=0.524, P<0.001). The R
2 which is a measure of the
effect size for the correlation is 0.274 indicating that the difficulty index can effectively predict the discriminative index only in 27.4% of cases.
Research Question 5: what is the class norm for the items exceeding the marginal level of acceptance?
Table 4: Frequency table of the scores of 1000 students
in the MAT
Scores 5-9
10-14
15-19
20-24
25-29
30-34
35-39
Frequency 21 108 233 236 225 149 28
Calculated mean is 22.475 ~ 22.
DISCUSSION AND CONCLUSSION
In this study, 80 items objective questions were developed and passed through several stages for the
purpose of standardization. The item analysis done from the trial testing showed that 50 (62.5%) of the items had
good difficulty level, while 52(65%) possessed good discrimination. However, a cross tabulation showed that only 40(50%) of the items had both good difficulty and
discrimination indexes and were therefore selected for the final instrument.
In determining the level of relationship between the difficulty index and discrimination index, Pearson R of
0.524 with an effect size R2 of 0.274 was calculated.
Cohen gave rules of the thumb for interpreting this effect size which is meant to tell us exactly how large
the relationship is between the two variables examined. According to him, an r of 0.1, represent a small effect size, 0.3 represents a medium effect size and 0.5
represents a large effect size. By implication, the difficulty indexes were to a mild extent good predictors
of the discrimination index, thus reinforcing the positive statistically significant linear relationship between the
difficulty and discrimination index. A class mean of 22 was calculated as shown in
tables above. This implies that when this test is administered to SS1 students within a time interval of 1 hour, any student who scores above 22 is above
average, while those who score below 22 are below average students in mathematics for that class.
REFERENCES
1. Davis, B.G. (2013), Types of tests-excerpt from quizzes, tests, and exams.
http://commons.trincoll.edu 2. Denga, D.I. (1987). Educational measurement,
continuous assessment and psychological testing .
Calabar: Rapid educational publishers 3. Diksha, K. (2020), Achievement test: meaning and
types explained. Yourarticlelibrary.com/education/guidance-techniques/achievement-test-meaning-and-types-
explained/63684 4. Great Schools Partnership. (2014). Standardized
test. edglossary.org/standardized-
test/#:~:text=A%20standardized%20test 5. Ifeakor, A.C. (2011). Standardized . In I.E. Osegbo
& A.C. Ifeakor(eds) ,Psychological measurement and evaluation in education. Onitsha. Fomech printing & Pub.co.ltd.
6. Kendra, C. (2020). How achievement test measure what people have learned. Verywellmind.com/what-is-an-achievement-
test-2794805
Ezeugo Nneka Chinyere et al., Int Aca J Edu Lte; Vol-2, Iss-3 (May-Jun, 2021): 37-45
45
7. Mehrens, W.A., & Lehmann, I.J. (1991).
Measurement and evaluation in education and psychology. Belmont. Holt, Rinehart and Winston.
8. Morrison, K.M., & Embretson, S.E. (2018).
Standardized tests. In B.B. Frey (ed) The Sage encyclopedia of educational research,
measurement and evaluation p3680. California. 9. Okoye, R.O. (1996), Educational and
Psychological measurement and evaluation. Lagos:
ED-Solid foundation. 10. Osegbo, L. (2011). Meaning of test, measurement
and evaluation. . In I.E. Osegbo & A.C.
Ifeakor(eds) ,Psychological measurement and evaluation in education. Onitsha. Fomech printing
& Pub.co.ltd. 11. Ramadan, M. (2014). 8 kinds of testing and 6 types
of tests. Elttguide.com/-kinds-of-testing-6-types-of-
tests/ 12. Rodiger H.L., Putnam A.L., & Smith, M.A. (2011).
ten benefits of testing and their applications to
educational practice. Psychology of learning and motivation, 55, 1-36.
Psychology.wustl.edu/memory/wp-
content/uploads/2018/04/BC_Roediger_et_al_2011
_PLM.pdf 13. Sheeba, S. (2017). Importance of testing and
evaluation in teaching and Learning. In E. Ahmad;
Importance of testing in teaching and learning. International Journal of Society and Humanities
II(1), 1-9. Researchgate.net/publication/328355159-importance-of-testing-in- teaching-and-
learning. 14. Sharm, H.L., & Poonam. (2017). Construction and
Standardization of an achievement test in English
grammar. International Journal of advanced educational research. 2(5) 230-235.
C:/users?N%20C/Downloads/2-5-108-853.pdf 15. Thorndike, R. (2014), Measurement and
Evaluation in Psychology and education,
8ed.Essex. Pearson education limited. 16. Wu, Y.F. (2018). Achievement tests. In B.B. Frey
(ed) The,Sage encyclopedia of educational
research, measurement and evaluation p3680. California.