learners : comparison of a cloze test and two...
TRANSCRIPT
193
A Tool to Measure English Ability of Diverse
Learners : Comparison of a Cloze Test
and Two Types of a Grammar Test
Yuko SHIMIZU
This study examines the feasibility of three English tests as tools to
measure diverse learners' ability of English at Shiga University Junior
College Course of Economics. The three tests used are as follows : (1) a
multiple-choice cloze test, (2) a syntax-based grammar test which is a
c}assical discrete-point test of English grammar and (3) a meaning-based
grammar test which is more like a reading test. This study concludes that
the multiple-choice cloze test has the most discriminating power of high and
low ability learners, and the syntax-based grammar test is the least.
I. INTRODUCTION
`English' in the educatinal context of Japan is one of the school subjects.
As to secondary schQol education, although various teaching approaches
have been introduced and experimented with in classes, teachers still tend
to uso the gmammar-translation or the quasi-grammar-translation method
as the grade level goes up (Eigo kyoiku Nennkan, p42). In addtion, the
purpose of study is stated in the course of study by the Ministry of Educa-
tion and the content of the curriculum is also regulated by the course of
study. Therefore the learners of English, who finish Japanese high school,
can be more homogeneous than other ESL situations in other countries in
terms of their English abilities and their language learning experiences.
194 SMee.ut..XNgEgef,f>:..kAsuistk (gg270•271-Sl-)
Contrary to secondary school, contents and goals of instructions are all
left in instructors'and/or institutions' hands at higher education in Japan.
That means there are great possibilities to give creative and effective
learning experiences to such homogeneous learners. However, if the instruc-
tor didn't have clear goals and didn't know his/her learners well, the lessons
would become one way and self-complacent. One way to know the diverse
learners is conducting some English tests for the purpose of roughly
identifying the learners levels and the teaching aims. The need of such tests
is particularly keen for Shiga University Junior College Course of Eco-
nomics, which accepts various types of individuals.
Compared with other institutions of higher learning, students of the
Junior College Course have wider diversity which comes from differences in
age, types of high school graduated from, learning experiences and social
experiences. Those factors affect the teaching-learning situation of English.
Some students are new graduates from high school and have been studying
English continually. Others may have finished high school education many
years ago and have long intervals of studying English. Some came from
vocational high schools, where English was Iess frequently taught, while
others studied English enthusiastically to overcome college entrance exami-
nations. Our Nyugakusha Senbatsuhouhou Kenkyuu Iinkai (1989) reported
that these differences in the students made English class management
extremely diffjcult.
In this study, practicability, test difficulty and test acceptability being
taken into account, two types of grammar tests and a cloze test were
prepared to investigate appropriateness of these tests as a tool or tools to
examine English abj]ities of the diverse learners.
A Tool to Measure English Ability of Diverse Learners 195
II. CLOZE TEST AND GRAMMAR TEST
Historically speaking, an emphasis of language testing was shifting
from discrete-point to integrative tests. The discrete-point tests are to
measure knowledge of performance in very restricted areas of the target
language --- English grammar, for example. The integrative tests, on the
other hand, measure knowledge of a variety of language features, modes, or
skills simultaneously (Henning, 1987). The shift from the former to the
latter has been in theory, however. For practical reasons, many large-scale
language proficiency tests are still discrete-point oriented. In this study an
attention is entered on a cloze test as an integrative test and two types of
grammar test as a discrete-point test.
A. CLOZE TEST A cloze procedure, which is constructed by deleting words systemat-
ically and requires the examinees to reconstruct the passage utilizing the
remaining words, was investigated by Taylor (1953) and was originally
used as an effective tool for measuring the readability of text for native
speakers of English. Later it came to be used as a measure of ESL profi-
ciency and became the focus of research at the same time.
Most studies on cloze tests deal with reliability and validity of the test,
in which those variables were manipulated: scoring methods ; number of
items ; deietion rates and patterns ; and types of passages. In addition to
them, correlational studies between cloze tests and other tests have been
done by many researchers. However, a state of the cloze test in the area of
language testing is a continuOus flux. Some researches have found the cloze
tests to be unreliable (e.g. AIderson, 1979; Klein-Braley, 1983). Brown
(1988) points out that different cloze tests in different situations may vary
196 lkNeefi.XNts'E'Eek....nd' Js(se (ag 270•271 e)
from weak to very strong in terms of reliability. On the other hand, many
correlational studies indicate that cloze tests are reliable to measure overall
ESL proficiency (Oller & Conrad, 1971 ; Hinofotis, 1980 ; Shimizu, 1989) .
The research reported here presumes that the cloze test is a device to
measure learners' ESL/EFL ability. The scores obtained by the cloze tests
were, therefore, assumed as one of the indices of learners' English abilities
and compared with rerults of the grammar tests.
In a standard cloze procedure, examinees are asked to write the most
appropriate word for each blank using their grammatical and lexical
knowledge, the given context and sometimes their `intuition.' That will
possibly give great frustration to the examinees. Haskell (1984), quoting
Anderson, indicates the criterion Ievels of cloze passages for ESL students.
That is, the scores below 440/o are found to be frustration level. Therefore
test constructors must be careful in selecting passages to use. Furthermore,
it is time consuming for the instructors to mark the answers, particularly
acceptable scoring method being used. The acceptable scoring method is
also impractical for non-native teachers of English. Brown (1978) compared
four scoring methods---exact, acceptable, clozentropy and multiple-choice
(MC) and reported that the validity did not show significant differences
among the four. Moreover, a correlational study of standard and MC cloze
tests with Japanese high school students by Shimizu (1989) indicated
correlation of O.68 (p<.Ol) . In this study, a multiple choice (MC) cloze test
was used in order for practicality of scoring and for lowering examinees'
anxlety.
B. PROBLEMS OF CLOZE TEST Sato (1988) argues the importance of developing testing methods to
measure global English proficiency in an educational climate in Japan. He
A Tool to Measure English Ability of Diverse Learners 197
attaches importance to the roles of cloze tests as entrance exminations and
placement tests, which is feasible to some extent. However, the cloze tests
require the examinees to understand the grammatical and lexical relation-
ships which link the meaning of sentences in a text. If the content and the
difficulty levei of the selected cloze passage are not relevant to the exami-
nees, their attitudes to the test will be negatively affeceted and their scores
will be lowered regardless of the real abilities of English. Moreover,
misunderstanding of one item will possibly Iead another. Item difficulty is
not independent and is influenced beyond items. In addition, since one cloze
item is often related to other items in the given context, it is impossible to
make some cloze items more fine graduated ones unless you use totally
different cloze passages. That makes test construction process more im-
practical and invalid when examinees' English levels range far and wide.
C. GRAMMAR TEST Grammar tests are used in many standardized tests. They focus on
grammatical points such as tense, article, sentence structure and so on. In
the first language, grammar is acquired as intuitive implicit knowledge. In
the second/foreign language, on the other hand, learners often pay attention
to grammatical signals to understand the meaning of the passage. Those
grammatical signals become testing points of grammar tests in many cases.
Let us compare the following two test items.
Choose the one word to complete the sentence.
-----t--Jt+'ttTT-------/ 11 /1 (a) My father always i 1.work : hard. d] d1 i 2. works : e1 e1 l 3.working i, t1 i 4. will work i 1i 1 i 1--....-t.---..".--.....
198 [liMee,i.E..fyNrz'EgEft..,.tr.JsZ se (ag 270•271 e)
.-.----.r-.'--t----7 1t /b (b) Myfatheralways l/ 1.stays E hard. // 1/ i 2. works : 1: 1t i 3. makes i 11 1e 1t : 4. has : /1 ed L--...--.-------...-
In both (a) (b),the statements and the answer keys are the same, but the
distractors are different. In (a) al} choices are variation of the word `work'.
The examinee is required to utilize his/her grammatical knowledge of
English inflections and word order. In (b) all verbs have the same inflectinal
morpheme---third person singular. In here the examinee must pay attention
to the meaning of the passage rather than syntactical relationship. Although
those two items can be consiedred as discrete-point tests of grammar, (a)
is more syntax-based than (b), and (b) is more meaning-based.
In this study two types of grammar tests, each has 50 items were
constructed. One is a (a) type test, whose focus is to measure understanding
of syntactical relationships. The other is a (b) type, which is a meaning-
based grammar test and most items in this form are possibly categorized as
very short `reading tests.' Although as many as 50 items, or we can say 50
different sentences/passages are given in these tests, one item does not
affect the other. Each item is independent in its context, which is different
from the cloze test. Therefore the items can be easily graduated and express
the examinee's ability in terms of number right.
The purpose of this study is to find out a more feasible test to grasp and
distinguish the levels of English proficiency of the diverse students of the
Junior College Course. To that end, two research questions were posed :
(1) Can the two grammar tests be used alternativery ?
(2) Which test discriminates higher and lower students more clearly,
A Tool to Measure English Ability of Diverse Learners 199
grammar or cloze ? If grammar, which one--- syntax-based or meaning
based grammar tests ?
III. METHOD
A. SUEUECTS The students of Shiga University Junior College Course of Economics
enrolled in English A and English B in 1990 were included in this study. The
total number of the subjects was 98. Among 98, the number of the students
who graduated from vocational high schools was 28. Fifty-four of them
were new graduates of high school in 1990 and entered the junior college
course directly. Twenty-four students graduated from high school in 1989,
eight students in 1988 and 12 graduated before 1987.
The subjects were assigned to two groups, Groups A and B, depending
on the type of a grammar test they took, which will be explained in detail
in the next section. The number of the students in Group A was 48 and 50
in Group B.
B. MATERIALS Two forms of a grammar test and one cloze test were constructed for
this study. As to the grammar tests, students were given 50 items, each
consisting of a short text of one or two sentences with a gap and four
alternatives from which to choose an appropriate word or phrase to fill the
gap. The alternatives of a grammar test for Group A were syntax-based and
those for Group B were more meaning-based. In the former test, therefore,
the examinees consciously acted upon the governing rules, while the exami-
nees of the latter test were needed to focus on the meaning and situation of
the passage. The students were randomly assiged to either Group A or
Group B.
200 fNee,th.,fytigE':ekd..inJZ se (ag 270 • 271 e)
As to the cloze test, every student took the same cloze test, whlch
consisted of two passages;one was about four seasons and the other
apples. The cloze passages were taken from an ESL textbook called Cloze
Connection. Every 8th word was deleted and four alternatives were given to
each deletion. Each passage had 25 deletions and the total number of the
deletjons was 50.
The subjects were asked to mark correct answers on separate answer
sheets.
C. PROCEDURES The last 30 minutes of a class period in May 1990 was used for the
grammar test. The cloze test was given one week later, using the last 30
minutes of a class period. The students took those tests as diagnostic tests.
Therefore the results were given back to them with some comments from
the instructor.
D. ANALYSES The test scores were interpreted by means of statistical analysis. The
data were computerized using a statistic software (Stat View512+) for
convemence.
IV. RESULTS
Table 1 shows mean scoses (X) , standard diviasion (SD) and range (R)
of each test. Since the students were randomly assigned to two groups, a
mean comparison of Groups A and B on the cloze tests was made before-
hand, using the t-test, to see the means of them would be equal. The t value
was .233, and the difference between the two groups was not significant (p
<.Ol).
A Tool to Measure English Ability of Diverse Learners
Table 1 SUMMARY OF RESULTS OF THE TESTS
201
GROUPA GROUPBX27.58 X27.34
GRAMMAR SD8.575 SD6.739R37(6-43) R33(11-44)
il27.54 i5i27.22
SD7.914 SD5.593R31(10-41) R26(I2-38)
CLOZE w X27.38H
' o SD6.795LE
R 31(10-41)
Group A : n= 48 those who took Syntax-based Grammar Test Group B : n=50 those who took Meaning-based Grammar Test
A. MEANCOMPARISON In order to see degree of discrimination of higher and lower students,
top fifteen <high group) and low fifteen studellts (low group) of each group
on each test were assigned to analyze. Results of a higher and lower
ditsinction based on the grammar test are discussed first. Then those based
on the cloze tests are followed. (In this study, `high students' and `a high
group' are used for the top fifteen students and `low students' and `a low
group' are for the Iow fifteen students.)
(1) BASED ON GRAMMAR RESULTS Table2 summarizes the means, standard deviations, ranges of high
fifteen and low fifteen of Groups A and B based on the scores of the
grammar test. Mean differences of Groups A and B and t values are also
included in the table.
As to the results of the grammar test of the high group, a mean of
Group A (syntax-based) scored higher than Group B (meaning-based) by 1.
66. Although the mean difference was not significant, the syntax-based
202 :Ei M ee .E.ii,,XNrztE!='E-e`fa.;.,.deJiZ ee (eg 270 • 271 ?)
Table2 HIGHER & LOWER DISCRIMINATION BASED ON GRAMMAR RESULTS
GroupA GroupB 5i.-51. t-test(DF=28)
X36.53 X34,87Hi
SD2.997 SD3.583 1.66 1.382n==15
GRAMMAR R10(33-43) R13(31-44)t-------- H' L'---.-t-.----.- -' trtJ---J-----t--rt .t--tttJJ-ft...------ ...-------.."-------TEST X17.27 X19,47
LoSD5.298 SD3.662 -2.20 -1.323
n=15 R18(6-24) R12(11-23)
r -X34.47 X31.47Hi
SD4.015 SD3.815 3.00 2.098*n=15
CLOZE R15(26-41) R13(25-38)Ttt"tt.-- -- =--.-.....-------- -- ="''""'t----'--' t...tt-t-ttt--tTt.." .t-------mt"----Tt'tTEST X20.00 X22.00
LoSD7.368 SD5.251 -2.00 -O.856
n=15 R24(10-34) R17(12-29)
"p<.05
grammar test tended to be slightly easier to answer for the high students.
That was not true, however, to the low group. The low fifteen of Group B
scored better than those of Group A by 2.20. The low students performed
better on the meaning-based gramrnar test.
How did the same students perform on the cloze test ? A mean differ-
ence of Groups A and B on the cloze results was significant between the
high groups of A and B (t value=2.098, p<.05). Despite that the two
grammar tests did not show significant differences between the two groups,
the same groups were estimated differently by the cloze test. This indicates,
at this moment, that the cloze test somehow disclosed differences and
seemed to more adequately handle ability differences among the high
students.
(2) BASEDONCLOZERESULTS
A Tool to Measure English Ability of Diverse Learners 203
The same type of results based on the cloze test were summarized in
Table 3. Examining high and low groups based on the cloze performance,
we found that general tendency was the same as the results discussed in the
previous section. In the grammar tests, the high group showed better
performance on the syntax-based test than on the meaning-based, and the
low group was better on the meaning-based tests. The significant difference
was observed for the high group (t=2.009, p<.05) . Concerning the results of
the cloze test, both high and low groups showed significant differences
between groups A and B (Hi: t=3.088, p<.Ol, Lo: t= 1.799, p<.05). this
supported the tentative indication made in (1). That is, the cloze test did
clearer discrimination than the two grammar tests and was effecive to
grade differences among the high students.
Table3 HIGHER & LOWER DISCRIMINATION BASED ON CLOZE RESULTS
NNGroupA GroupB XA-XB t-test(DF=28)
N .X34.47 X30.67Hi
SD4.240 SD5.972 3.80 2.009*n=15
GRAMMAR R18(24-42) R21(23-44).. tfi----- --- =t-Jtt-t't-------- -- =ttttttt'tN-------- -t--.--.t--......+--- -----------e--tttt--pTEST X19.53 X21.73
LoSD7.367 SD5.637 -2.20 -O.919
n==15R25(6-31) R19(11-30)
T -X35.87 X33.07Hi
SD2.446 SD2.520 2.80 3.088**n=15
CLOZE R8(33-41) R8(30-38)
TEXT -tt---t't
LoT- =t-t---tt"'T------
X17.87-- =ttttttT-T---t-t---
X20.80t---tttttt'-t-t------ ----t----tt-tt'T"--t-
SD4.642 SD4.280 -2s93 -1.799*n=15 R15(10-25) R13(12-25)
'p<.05 "P<.Ol
204 15Meept..fyNgE:-ek....de)tee (ag 270•271 e)
(3) WITHIN THE SAME GROUP Finally, rnean differences between the grammar and the cloze tests
within the same group were analyzed using paired t tests (Table4). No
significance was observed in Group A, who took the syntax-based grammar
test. Concerning Group B, however, t values were significant for both high
and low groups based on the results of the meaning-based grammar test
(Hi : t=3.001, p<.05, Lo : t=2.443, p<Ol). This means that the differences
between the scores of the meaning-based grammar test and the cloze test
were greater than other cases. Relatively speaking, high students tended to
obtain higher score on the meaning-based grammar test and the low stu-
dents on the cloze tests. That is, the meaning-based grammar tests still
required certain knowledge of grammar, which made it harder for the low
students to respond correctly. A further analysis could not be done in this
study since a correlation of the two grammar tests was not available.
Table 4 PAIRED T-TEST(between GRAMMAR & CLOZE)
GroupA GroupBXGR-XcLz pairedtvalue XGR-5(cLz pairedtvalue
WHOLE .042 .051 .12 .169
basedonGRAMMAR
Hi
Lo2.067
-2.733
1.646
-1.416
3.4
-2.533
3.001**
-2.443*
basedonCLOZE
Hi
Lo-L4
1.667
-l.531
.983
-2.4.933
-1.799.816
*p<.05 "p<.Ol
B. CORRELATION Correlations between the grammar
into a table (Table 5).
Correlations between the grammar
test and the cloze test were made
tests and the cloze test were very
A Tool to Measure English Ability of Diverse Learners
Table5 CORRELATIONS
205
GroupA GroupB
WHOLE .763 .683
HilLo, ..HiI'LoT
basedon
GRAMMARl.298i.646**
basedonCLOZE
l.378:.478*
l
i.509**i.631**
i
*p<.05 **p<.Ol
high in both Groups A and B as a whole (r--.763, .683 respectively).
However, high and low students being focused, interesting characteristics
were observed. The syntax-based grammar test (Group A) had very low
correlation with the cloze test. The correlation of high students was only O.
061, which we interpreted there was no correlation. The only significant
correlation observed with Group A was that of the low group based on the
cloze results (r=.478, p<O.05).
On the other hand, the meaning-based grammar test (Group B) had
moderate correlation with the high group based on the cloze test(r= .509,
p<.Ol) and with the low group based on both grammar and cloze results (r
= .646, p<.Ol ; r=: .631, p<.Ol). These results indicated that the meaning-
based grammar tests and the cloze test were more related each other than
the syntax-based and the cloze tests.
V. CONCLUSIONIDISCUSSION
A tendency observed in this study was as follows. In spite that the
significant difference was not revealed as a whole, high students tended to
score better on the sytax-based grammar test than on the meaning-based
grammar test, while low students tended to score better on the meaning-
206 :SNEH ,rt.i•.XRtsEE'ilA..iiAmaJskk (ag 270 • 271 -Si•)
based one. This result shows that better performers will possibly have
better and more grammatical knowledge, which helps to respond pure
discrete-point tests of grammar. As to the low students, they are presum-
ably lacking in such knowledge. Therefore they must respond items relying
only on contextual clues. If the clues are not available or context does not
help to answer, they will miss the items. The lack of grammatical ability
sets limits to what is achieved in the way of performing skills. What we are
interested in is not measuring explicit grammatical knowledge of the
learners but more global ability in this study of testing. Therefore the
syntax-based grammar test is less favorable and valid.
As far as thettests prove (Table 2),those two types of grammar tests
seem to be used interchangeably. However, unstable correlations of the
syntax-based grammar test with the cloze test imply that the syntax-based
grammar test does not discrjminate the examinees' levels properly. Conse-
quenty, the meaning-based grammar test is more reliable than the syntax-
based one and the two grammar tests are not treated as parallel tests and
should not be used interchangeably.
Compared with the syntax-based grammar test items, the meaning-
based ones are closer to reading items than to grammar items. Therefore it
is understandable that the meaning-based grammar test had a high and
stable correlation with the cloze test which was a technique for measuring
reading comprehension. That is, the meaning-based grammer test can be
placed somewhere between a discrete-point and a integrative tests. Now we
turn our eyes to the meaning-based grammar test and the cloze test.
Even if one fell under a high group as a result of the meaning-based
grammar test, s/he did not necessarily fall under a high group on the cloze
test. On the other hand, the results of the cloze test were highly related with
the grammar tests. Even the meaning-based grammar tests, which was
A Tool to Measure English Ability of Diverse Learners 207
proved to be more reliable than the syntax-based one, failed to adequately
measure true ability differences of the subjects. Therefore we conclude that
the cloze test is the most reliable among the tests used in this study.
In this study, a cloze test was shown to have discriminating power of
learners. There are several points to mention, however. First, the cloze tests
will give frustration to exminees, particularly to low ability examinees
whose scores are below 440/o. Nineteen subjects out of 98 scored less than
44% in the MC cloze in this study. If we use standard cloze procedure, far
more subjects will score below 440/o. As is always a problem for cloze
procedure, a test constructor must be careful in choosing an appropriate
level of passage. Related to that is the question of content validity. Biased
items may be included in the cloze passage and it is not easy to rewrite or
modify the test items if one is to construct `a test' which fits various
learners. Finally, to provide appropriate distructors is important and diffi-
cult to make the test valid. Those negative aspects of test construction must
be considered.
In the end, I will mention that little has been discussed regarding
appropriateness of using a cloze test as a tool to measure English profi-
ciency. The extent of the differences of the two distributions---the cloze test
and the grammar tests--- will depend on the differences of the difficulties
and characteristics of the tests. The tests constructed for this study were
appropriate difficulty level for the examinees. However, how an examinee
performed with each item on each test was not examined. In order to get
test score characteristics, detailed statistical analysis such as item difficulty
and item discrimination are necessary to make broad generalizations.
ACKNOWLEDGEMENTI would like to express my gratitude to Emeritus Professor Ryue Yoshida,
208 EMee.e.i..XNtsEE-eft..,Er4maljtsu (eg 270•271 e)
who retired from Shiga University Junior College Course of Economics in
March, 1991. Although the time I could share with him at the Junior College
Course was short, his continuous encouragement and support were indelible.
His enthusiasm to education will never be forgotten and will be inherited to
his colleagues. I hope he is still ever ready to assist us in our various
endeavors in the future, for which we are truly grateful.
REFERENCEAIderson, J.C. (1979). The cloze procedure and proficiency in English as a foreign
language. TESOL Quarterly, 13, 219-223,
Boning, R. A. (1981), Cloze Connections. Baldwin, NY : Barnell Loft, Ltd.
Brown, J.D, (1978), Correlational Study of Four Methods for Scon'ng Clo2e Tests.
Unpublished master's thesis, University of California, Los Angeles.
Brown, J. D. (1988). Tailored cloze : improved with classical item analysis techniques,
Language Testing, volurne 5, Number 1, 19-31.
Eigo Kyouiku Nenkan (1976). Tokyo : Kaitaku-sha.
Haskell, J. F, (1984). Unpublished paper reprinted from English Record, spring 1975
Henning, Grant. (1987) . A Guide to Language Testing. Cambridge, Ma. : Newbury House.
Hinofotis, F. (1980) An Alternative CIoze Testing Procedure : Multiple-Choice Format,
In Research in Language Testing. OIJer, J.W. Jr. and Perking, K (Eds.) Ma.:
Newbury House.KIein-Braley, C. (1983). A cloze is a cloze is a question. In J. VVr. OIIer, Jr, (Ed ), Issues
in language testing research (pp.218-228). Rowley, Ma.: Newbury House.
Oller, J, W, Jr. & Conrad (1971) The Cloze Technique and ESL Proficiency. Language
Learning 21 (2), pp 183-195.
Sato, Shiro (1988). Cloze Test to Eigo Kyouiku. Tokyo. Nan-un Do.
Shiga Daigaku Keizai Tanki Daigakubu Nyuugakusha Senbatsuhouhou Kennyuu Iinkai (1989). Nyuugakusha Senbatsuhouhou Kendyuu Iinhai Hoblkoku-sho.
Shimizu, Y (1989) Eiken Hikki-shiken to Cloze Test ni Mirareru Soukan ni Kansuru
Kenkyuu. STEP BULLETIIV Vol,1 March 1989, pp103-116. Tokyo. Nihon Eigo Kentei Kyoukai.Taylor, W. L. (1953) , CIoze Procedure : a new tool for measuring readability. Iournalism
Quarterly 30, 414-38