learners : comparison of a cloze test and two...

193

A Tool to Measure English Ability of Diverse

Learners : Comparison of a Cloze Test

and Two Types of a Grammar Test

Yuko SHIMIZU

This study examines the feasibility of three English tests as tools to

measure diverse learners' ability of English at Shiga University Junior

College Course of Economics. The three tests used are as follows : (1) a

multiple-choice cloze test, (2) a syntax-based grammar test which is a

c}assical discrete-point test of English grammar and (3) a meaning-based

grammar test which is more like a reading test. This study concludes that

the multiple-choice cloze test has the most discriminating power of high and

low ability learners, and the syntax-based grammar test is the least.

I. INTRODUCTION

`English' in the educatinal context of Japan is one of the school subjects.

As to secondary schQol education, although various teaching approaches

have been introduced and experimented with in classes, teachers still tend

to uso the gmammar-translation or the quasi-grammar-translation method

as the grade level goes up (Eigo kyoiku Nennkan, p42). In addtion, the

purpose of study is stated in the course of study by the Ministry of Educa-

tion and the content of the curriculum is also regulated by the course of

study. Therefore the learners of English, who finish Japanese high school,

can be more homogeneous than other ESL situations in other countries in

terms of their English abilities and their language learning experiences.

194 SMee.ut..XNgEgef,f>:..kAsuistk (gg270•271-Sl-)

Contrary to secondary school, contents and goals of instructions are all

left in instructors'and/or institutions' hands at higher education in Japan.

That means there are great possibilities to give creative and effective

learning experiences to such homogeneous learners. However, if the instruc-

tor didn't have clear goals and didn't know his/her learners well, the lessons

would become one way and self-complacent. One way to know the diverse

learners is conducting some English tests for the purpose of roughly

identifying the learners levels and the teaching aims. The need of such tests

is particularly keen for Shiga University Junior College Course of Eco-

nomics, which accepts various types of individuals.

Compared with other institutions of higher learning, students of the

Junior College Course have wider diversity which comes from differences in

age, types of high school graduated from, learning experiences and social

experiences. Those factors affect the teaching-learning situation of English.

Some students are new graduates from high school and have been studying

English continually. Others may have finished high school education many

years ago and have long intervals of studying English. Some came from

vocational high schools, where English was Iess frequently taught, while

others studied English enthusiastically to overcome college entrance exami-

nations. Our Nyugakusha Senbatsuhouhou Kenkyuu Iinkai (1989) reported

that these differences in the students made English class management

extremely diffjcult.

In this study, practicability, test difficulty and test acceptability being

taken into account, two types of grammar tests and a cloze test were

prepared to investigate appropriateness of these tests as a tool or tools to

examine English abj]ities of the diverse learners.

A Tool to Measure English Ability of Diverse Learners 195

II. CLOZE TEST AND GRAMMAR TEST

Historically speaking, an emphasis of language testing was shifting

from discrete-point to integrative tests. The discrete-point tests are to

measure knowledge of performance in very restricted areas of the target

language --- English grammar, for example. The integrative tests, on the

other hand, measure knowledge of a variety of language features, modes, or

skills simultaneously (Henning, 1987). The shift from the former to the

latter has been in theory, however. For practical reasons, many large-scale

language proficiency tests are still discrete-point oriented. In this study an

attention is entered on a cloze test as an integrative test and two types of

grammar test as a discrete-point test.

A. CLOZE TEST A cloze procedure, which is constructed by deleting words systemat-

ically and requires the examinees to reconstruct the passage utilizing the

remaining words, was investigated by Taylor (1953) and was originally

used as an effective tool for measuring the readability of text for native

speakers of English. Later it came to be used as a measure of ESL profi-

ciency and became the focus of research at the same time.

Most studies on cloze tests deal with reliability and validity of the test,

in which those variables were manipulated: scoring methods ; number of

items ; deietion rates and patterns ; and types of passages. In addition to

them, correlational studies between cloze tests and other tests have been

done by many researchers. However, a state of the cloze test in the area of

language testing is a continuOus flux. Some researches have found the cloze

tests to be unreliable (e.g. AIderson, 1979; Klein-Braley, 1983). Brown

(1988) points out that different cloze tests in different situations may vary

196 lkNeefi.XNts'E'Eek....nd' Js(se (ag 270•271 e)

from weak to very strong in terms of reliability. On the other hand, many

correlational studies indicate that cloze tests are reliable to measure overall

ESL proficiency (Oller & Conrad, 1971 ; Hinofotis, 1980 ; Shimizu, 1989) .

The research reported here presumes that the cloze test is a device to

measure learners' ESL/EFL ability. The scores obtained by the cloze tests

were, therefore, assumed as one of the indices of learners' English abilities

and compared with rerults of the grammar tests.

In a standard cloze procedure, examinees are asked to write the most

appropriate word for each blank using their grammatical and lexical

knowledge, the given context and sometimes their `intuition.' That will

possibly give great frustration to the examinees. Haskell (1984), quoting

Anderson, indicates the criterion Ievels of cloze passages for ESL students.

That is, the scores below 440/o are found to be frustration level. Therefore

test constructors must be careful in selecting passages to use. Furthermore,

it is time consuming for the instructors to mark the answers, particularly

acceptable scoring method being used. The acceptable scoring method is

also impractical for non-native teachers of English. Brown (1978) compared

four scoring methods---exact, acceptable, clozentropy and multiple-choice

(MC) and reported that the validity did not show significant differences

among the four. Moreover, a correlational study of standard and MC cloze

tests with Japanese high school students by Shimizu (1989) indicated

correlation of O.68 (p<.Ol) . In this study, a multiple choice (MC) cloze test

was used in order for practicality of scoring and for lowering examinees'

anxlety.

B. PROBLEMS OF CLOZE TEST Sato (1988) argues the importance of developing testing methods to

measure global English proficiency in an educational climate in Japan. He


attaches importance to the roles of cloze tests as entrance exminations and

placement tests, which is feasible to some extent. However, the cloze tests

require the examinees to understand the grammatical and lexical relation-

ships which link the meaning of sentences in a text. If the content and the

difficulty levei of the selected cloze passage are not relevant to the exami-

nees, their attitudes to the test will be negatively affeceted and their scores

will be lowered regardless of the real abilities of English. Moreover,

misunderstanding of one item will possibly Iead another. Item difficulty is

not independent and is influenced beyond items. In addition, since one cloze

item is often related to other items in the given context, it is impossible to

make some cloze items more fine graduated ones unless you use totally

different cloze passages. That makes test construction process more im-

practical and invalid when examinees' English levels range far and wide.

C. GRAMMAR TEST Grammar tests are used in many standardized tests. They focus on

grammatical points such as tense, article, sentence structure and so on. In

the first language, grammar is acquired as intuitive implicit knowledge. In

the second/foreign language, on the other hand, learners often pay attention

to grammatical signals to understand the meaning of the passage. Those

grammatical signals become testing points of grammar tests in many cases.

Let us compare the following two test items.

Choose the one word to complete the sentence.

-----t--Jt+'ttTT-------/ 11 /1 (a) My father always i 1.work : hard. d] d1 i 2. works : e1 e1 l 3.working i, t1 i 4. will work i 1i 1 i 1--....-t.---..".--.....

198 [liMee,i.E..fyNrz'EgEft..,.tr.JsZ se (ag 270•271 e)

.-.----.r-.'--t----7 1t /b (b) Myfatheralways l/ 1.stays E hard. // 1/ i 2. works : 1: 1t i 3. makes i 11 1e 1t : 4. has : /1 ed L--...--.-------...-

In both (a) (b),the statements and the answer keys are the same, but the

distractors are different. In (a) al} choices are variation of the word `work'.

The examinee is required to utilize his/her grammatical knowledge of

English inflections and word order. In (b) all verbs have the same inflectinal

morpheme---third person singular. In here the examinee must pay attention

to the meaning of the passage rather than syntactical relationship. Although

those two items can be consiedred as discrete-point tests of grammar, (a)

is more syntax-based than (b), and (b) is more meaning-based.

In this study two types of grammar tests, each has 50 items were

constructed. One is a (a) type test, whose focus is to measure understanding

of syntactical relationships. The other is a (b) type, which is a meaning-

based grammar test and most items in this form are possibly categorized as

very short `reading tests.' Although as many as 50 items, or we can say 50

different sentences/passages are given in these tests, one item does not

affect the other. Each item is independent in its context, which is different

from the cloze test. Therefore the items can be easily graduated and express

the examinee's ability in terms of number right.

The purpose of this study is to find out a more feasible test to grasp and

distinguish the levels of English proficiency of the diverse students of the

Junior College Course. To that end, two research questions were posed :

(1) Can the two grammar tests be used alternativery ?

(2) Which test discriminates higher and lower students more clearly,


grammar or cloze ? If grammar, which one--- syntax-based or meaning

based grammar tests ?

III. METHOD

A. SUEUECTS The students of Shiga University Junior College Course of Economics

enrolled in English A and English B in 1990 were included in this study. The

total number of the subjects was 98. Among 98, the number of the students

who graduated from vocational high schools was 28. Fifty-four of them

were new graduates of high school in 1990 and entered the junior college

course directly. Twenty-four students graduated from high school in 1989,

eight students in 1988 and 12 graduated before 1987.

The subjects were assigned to two groups, Groups A and B, depending

on the type of a grammar test they took, which will be explained in detail

in the next section. The number of the students in Group A was 48 and 50

in Group B.

B. MATERIALS Two forms of a grammar test and one cloze test were constructed for

this study. As to the grammar tests, students were given 50 items, each

consisting of a short text of one or two sentences with a gap and four

alternatives from which to choose an appropriate word or phrase to fill the

gap. The alternatives of a grammar test for Group A were syntax-based and

those for Group B were more meaning-based. In the former test, therefore,

the examinees consciously acted upon the governing rules, while the exami-

nees of the latter test were needed to focus on the meaning and situation of

the passage. The students were randomly assiged to either Group A or

Group B.

200 fNee,th.,fytigE':ekd..inJZ se (ag 270 • 271 e)

As to the cloze test, every student took the same cloze test, whlch

consisted of two passages;one was about four seasons and the other

apples. The cloze passages were taken from an ESL textbook called Cloze

Connection. Every 8th word was deleted and four alternatives were given to

each deletion. Each passage had 25 deletions and the total number of the

deletjons was 50.

The subjects were asked to mark correct answers on separate answer

sheets.

C. PROCEDURES The last 30 minutes of a class period in May 1990 was used for the

grammar test. The cloze test was given one week later, using the last 30

minutes of a class period. The students took those tests as diagnostic tests.

Therefore the results were given back to them with some comments from

the instructor.

D. ANALYSES The test scores were interpreted by means of statistical analysis. The

data were computerized using a statistic software (Stat View512+) for

convemence.

IV. RESULTS

Table 1 shows mean scoses (X) , standard diviasion (SD) and range (R)

of each test. Since the students were randomly assigned to two groups, a

mean comparison of Groups A and B on the cloze tests was made before-

hand, using the t-test, to see the means of them would be equal. The t value

was .233, and the difference between the two groups was not significant (p

<.Ol).

A Tool to Measure English Ability of Diverse Learners

Table 1 SUMMARY OF RESULTS OF THE TESTS

201

GROUPA GROUPBX27.58 X27.34

GRAMMAR SD8.575 SD6.739R37(6-43) R33(11-44)

il27.54 i5i27.22

SD7.914 SD5.593R31(10-41) R26(I2-38)

CLOZE w X27.38H

' o SD6.795LE

R 31(10-41)

Group A : n= 48 those who took Syntax-based Grammar Test Group B : n=50 those who took Meaning-based Grammar Test

A. MEANCOMPARISON In order to see degree of discrimination of higher and lower students,

top fifteen <high group) and low fifteen studellts (low group) of each group

on each test were assigned to analyze. Results of a higher and lower

ditsinction based on the grammar test are discussed first. Then those based

on the cloze tests are followed. (In this study, `high students' and `a high

group' are used for the top fifteen students and `low students' and `a low

group' are for the Iow fifteen students.)

(1) BASED ON GRAMMAR RESULTS Table2 summarizes the means, standard deviations, ranges of high

fifteen and low fifteen of Groups A and B based on the scores of the

grammar test. Mean differences of Groups A and B and t values are also

included in the table.

As to the results of the grammar test of the high group, a mean of

Group A (syntax-based) scored higher than Group B (meaning-based) by 1.

66. Although the mean difference was not significant, the syntax-based

202 :Ei M ee .E.ii,,XNrztE!='E-e`fa.;.,.deJiZ ee (eg 270 • 271 ?)

Table2 HIGHER & LOWER DISCRIMINATION BASED ON GRAMMAR RESULTS

GroupA GroupB 5i.-51. t-test(DF=28)

X36.53 X34,87Hi

SD2.997 SD3.583 1.66 1.382n==15

GRAMMAR R10(33-43) R13(31-44)t-------- H' L'---.-t-.----.- -' trtJ---J-----t--rt .t--tttJJ-ft...------ ...-------.."-------TEST X17.27 X19,47

LoSD5.298 SD3.662 -2.20 -1.323

n=15 R18(6-24) R12(11-23)

r -X34.47 X31.47Hi

SD4.015 SD3.815 3.00 2.098*n=15

CLOZE R15(26-41) R13(25-38)Ttt"tt.-- -- =--.-.....-------- -- ="''""'t----'--' t...tt-t-ttt--tTt.." .t-------mt"----Tt'tTEST X20.00 X22.00

LoSD7.368 SD5.251 -2.00 -O.856

n=15 R24(10-34) R17(12-29)

"p<.05

grammar test tended to be slightly easier to answer for the high students.

That was not true, however, to the low group. The low fifteen of Group B

scored better than those of Group A by 2.20. The low students performed

better on the meaning-based gramrnar test.

How did the same students perform on the cloze test ? A mean differ-

ence of Groups A and B on the cloze results was significant between the

high groups of A and B (t value=2.098, p<.05). Despite that the two

grammar tests did not show significant differences between the two groups,

the same groups were estimated differently by the cloze test. This indicates,

at this moment, that the cloze test somehow disclosed differences and

seemed to more adequately handle ability differences among the high

students.

(2) BASEDONCLOZERESULTS


The same type of results based on the cloze test were summarized in

Table 3. Examining high and low groups based on the cloze performance,

we found that general tendency was the same as the results discussed in the

previous section. In the grammar tests, the high group showed better

performance on the syntax-based test than on the meaning-based, and the

low group was better on the meaning-based tests. The significant difference

was observed for the high group (t=2.009, p<.05) . Concerning the results of

the cloze test, both high and low groups showed significant differences

between groups A and B (Hi: t=3.088, p<.Ol, Lo: t= 1.799, p<.05). this

supported the tentative indication made in (1). That is, the cloze test did

clearer discrimination than the two grammar tests and was effecive to

grade differences among the high students.

Table3 HIGHER & LOWER DISCRIMINATION BASED ON CLOZE RESULTS

NNGroupA GroupB XA-XB t-test(DF=28)

N .X34.47 X30.67Hi

SD4.240 SD5.972 3.80 2.009*n=15

GRAMMAR R18(24-42) R21(23-44).. tfi----- --- =t-Jtt-t't-------- -- =ttttttt'tN-------- -t--.--.t--......+--- -----------e--tttt--pTEST X19.53 X21.73

LoSD7.367 SD5.637 -2.20 -O.919

n==15R25(6-31) R19(11-30)

T -X35.87 X33.07Hi

SD2.446 SD2.520 2.80 3.088**n=15

CLOZE R8(33-41) R8(30-38)

TEXT -tt---t't

LoT- =t-t---tt"'T------

X17.87-- =ttttttT-T---t-t---

X20.80t---tttttt'-t-t------ ----t----tt-tt'T"--t-

SD4.642 SD4.280 -2s93 -1.799*n=15 R15(10-25) R13(12-25)

'p<.05 "P<.Ol

204 15Meept..fyNgE:-ek....de)tee (ag 270•271 e)

(3) WITHIN THE SAME GROUP Finally, rnean differences between the grammar and the cloze tests

within the same group were analyzed using paired t tests (Table4). No

significance was observed in Group A, who took the syntax-based grammar

test. Concerning Group B, however, t values were significant for both high

and low groups based on the results of the meaning-based grammar test

(Hi : t=3.001, p<.05, Lo : t=2.443, p<Ol). This means that the differences

between the scores of the meaning-based grammar test and the cloze test

were greater than other cases. Relatively speaking, high students tended to

obtain higher score on the meaning-based grammar test and the low stu-

dents on the cloze tests. That is, the meaning-based grammar tests still

required certain knowledge of grammar, which made it harder for the low

students to respond correctly. A further analysis could not be done in this

study since a correlation of the two grammar tests was not available.

Table 4 PAIRED T-TEST(between GRAMMAR & CLOZE)

GroupA GroupBXGR-XcLz pairedtvalue XGR-5(cLz pairedtvalue

WHOLE .042 .051 .12 .169

basedonGRAMMAR

Hi

Lo2.067

-2.733

1.646

-1.416

3.4

-2.533

3.001**

-2.443*

basedonCLOZE

Hi

Lo-L4

1.667

-l.531

.983

-2.4.933

-1.799.816

*p<.05 "p<.Ol

B. CORRELATION Correlations between the grammar

into a table (Table 5).

Correlations between the grammar

test and the cloze test were made

tests and the cloze test were very

A Tool to Measure English Ability of Diverse Learners

Table5 CORRELATIONS

205

GroupA GroupB

WHOLE .763 .683

HilLo, ..HiI'LoT

basedon

GRAMMARl.298i.646**

basedonCLOZE

l.378:.478*

l

i.509**i.631**

i

*p<.05 **p<.Ol

high in both Groups A and B as a whole (r--.763, .683 respectively).

However, high and low students being focused, interesting characteristics

were observed. The syntax-based grammar test (Group A) had very low

correlation with the cloze test. The correlation of high students was only O.

061, which we interpreted there was no correlation. The only significant

correlation observed with Group A was that of the low group based on the

cloze results (r=.478, p<O.05).

On the other hand, the meaning-based grammar test (Group B) had

moderate correlation with the high group based on the cloze test(r= .509,

p<.Ol) and with the low group based on both grammar and cloze results (r

= .646, p<.Ol ; r=: .631, p<.Ol). These results indicated that the meaning-

based grammar tests and the cloze test were more related each other than

the syntax-based and the cloze tests.

V. CONCLUSIONIDISCUSSION

A tendency observed in this study was as follows. In spite that the

significant difference was not revealed as a whole, high students tended to

score better on the sytax-based grammar test than on the meaning-based

grammar test, while low students tended to score better on the meaning-

206 :SNEH ,rt.i•.XRtsEE'ilA..iiAmaJskk (ag 270 • 271 -Si•)

based one. This result shows that better performers will possibly have

better and more grammatical knowledge, which helps to respond pure

discrete-point tests of grammar. As to the low students, they are presum-

ably lacking in such knowledge. Therefore they must respond items relying

only on contextual clues. If the clues are not available or context does not

help to answer, they will miss the items. The lack of grammatical ability

sets limits to what is achieved in the way of performing skills. What we are

interested in is not measuring explicit grammatical knowledge of the

learners but more global ability in this study of testing. Therefore the

syntax-based grammar test is less favorable and valid.

As far as thettests prove (Table 2),those two types of grammar tests

seem to be used interchangeably. However, unstable correlations of the

syntax-based grammar test with the cloze test imply that the syntax-based

grammar test does not discrjminate the examinees' levels properly. Conse-

quenty, the meaning-based grammar test is more reliable than the syntax-

based one and the two grammar tests are not treated as parallel tests and

should not be used interchangeably.

Compared with the syntax-based grammar test items, the meaning-

based ones are closer to reading items than to grammar items. Therefore it

is understandable that the meaning-based grammar test had a high and

stable correlation with the cloze test which was a technique for measuring

reading comprehension. That is, the meaning-based grammer test can be

placed somewhere between a discrete-point and a integrative tests. Now we

turn our eyes to the meaning-based grammar test and the cloze test.

Even if one fell under a high group as a result of the meaning-based

grammar test, s/he did not necessarily fall under a high group on the cloze

test. On the other hand, the results of the cloze test were highly related with

the grammar tests. Even the meaning-based grammar tests, which was


proved to be more reliable than the syntax-based one, failed to adequately

measure true ability differences of the subjects. Therefore we conclude that

the cloze test is the most reliable among the tests used in this study.

In this study, a cloze test was shown to have discriminating power of

learners. There are several points to mention, however. First, the cloze tests

will give frustration to exminees, particularly to low ability examinees

whose scores are below 440/o. Nineteen subjects out of 98 scored less than

44% in the MC cloze in this study. If we use standard cloze procedure, far

more subjects will score below 440/o. As is always a problem for cloze

procedure, a test constructor must be careful in choosing an appropriate

level of passage. Related to that is the question of content validity. Biased

items may be included in the cloze passage and it is not easy to rewrite or

modify the test items if one is to construct `a test' which fits various

learners. Finally, to provide appropriate distructors is important and diffi-

cult to make the test valid. Those negative aspects of test construction must

be considered.

In the end, I will mention that little has been discussed regarding

appropriateness of using a cloze test as a tool to measure English profi-

ciency. The extent of the differences of the two distributions---the cloze test

and the grammar tests--- will depend on the differences of the difficulties

and characteristics of the tests. The tests constructed for this study were

appropriate difficulty level for the examinees. However, how an examinee

performed with each item on each test was not examined. In order to get

test score characteristics, detailed statistical analysis such as item difficulty

and item discrimination are necessary to make broad generalizations.

ACKNOWLEDGEMENTI would like to express my gratitude to Emeritus Professor Ryue Yoshida,

208 EMee.e.i..XNtsEE-eft..,Er4maljtsu (eg 270•271 e)

who retired from Shiga University Junior College Course of Economics in

March, 1991. Although the time I could share with him at the Junior College

Course was short, his continuous encouragement and support were indelible.

His enthusiasm to education will never be forgotten and will be inherited to

his colleagues. I hope he is still ever ready to assist us in our various

endeavors in the future, for which we are truly grateful.

REFERENCEAIderson, J.C. (1979). The cloze procedure and proficiency in English as a foreign

language. TESOL Quarterly, 13, 219-223,

Boning, R. A. (1981), Cloze Connections. Baldwin, NY : Barnell Loft, Ltd.

Brown, J.D, (1978), Correlational Study of Four Methods for Scon'ng Clo2e Tests.

Unpublished master's thesis, University of California, Los Angeles.

Brown, J. D. (1988). Tailored cloze : improved with classical item analysis techniques,

Language Testing, volurne 5, Number 1, 19-31.

Eigo Kyouiku Nenkan (1976). Tokyo : Kaitaku-sha.

Haskell, J. F, (1984). Unpublished paper reprinted from English Record, spring 1975

Henning, Grant. (1987) . A Guide to Language Testing. Cambridge, Ma. : Newbury House.

Hinofotis, F. (1980) An Alternative CIoze Testing Procedure : Multiple-Choice Format,

In Research in Language Testing. OIJer, J.W. Jr. and Perking, K (Eds.) Ma.:

Newbury House.KIein-Braley, C. (1983). A cloze is a cloze is a question. In J. VVr. OIIer, Jr, (Ed ), Issues

in language testing research (pp.218-228). Rowley, Ma.: Newbury House.

Oller, J, W, Jr. & Conrad (1971) The Cloze Technique and ESL Proficiency. Language

Learning 21 (2), pp 183-195.

Sato, Shiro (1988). Cloze Test to Eigo Kyouiku. Tokyo. Nan-un Do.

Shiga Daigaku Keizai Tanki Daigakubu Nyuugakusha Senbatsuhouhou Kennyuu Iinkai (1989). Nyuugakusha Senbatsuhouhou Kendyuu Iinhai Hoblkoku-sho.

Shimizu, Y (1989) Eiken Hikki-shiken to Cloze Test ni Mirareru Soukan ni Kansuru

Kenkyuu. STEP BULLETIIV Vol,1 March 1989, pp103-116. Tokyo. Nihon Eigo Kentei Kyoukai.Taylor, W. L. (1953) , CIoze Procedure : a new tool for measuring readability. Iournalism

Quarterly 30, 414-38

learners : comparison of a cloze test and two...

Documents