Download - 1Testing Scales of References
-
8/3/2019 1Testing Scales of References
1/33
-
8/3/2019 1Testing Scales of References
2/33
Scales of Reference for Testing of
Proficiency However, in the two decades since these
ratings were first suggested,
applied linguists' views about what constitutes a true 'zero' or
'perfect' points or even
'native speaker' or
what an educated person's proficiency in alanguage means
have changed considerably. (Chandee, 1997)
-
8/3/2019 1Testing Scales of References
3/33
McNamara (1995)
We cannot assume that native speakers
will perform better than non-native
speakers in the tasks on our tests,
as native and non-native speakers may not
easily be distinguished in terms of the non-
linguistic performance capacities that areinvolved in the tasks (p. 165).
-
8/3/2019 1Testing Scales of References
4/33
Face and Content Validity
-
8/3/2019 1Testing Scales of References
5/33
2.4.5 Face and Content Validity
Proficiency scales have high face validity: they look as if they are testing what they claim to be testing.
This is not validity in the technical sense(Anastasi, 1976,p. 139).
Although the use of proficiency scales can help to guideteachers and learners in setting realistic goals,
they raise a number of difficult issues inherent in the nature oflanguage proficiency
and with important implications for how it is measured
(Hyltenstam & Pienemann, 1985, p. 222).
-
8/3/2019 1Testing Scales of References
6/33
Chandee 1997
What is important to note here is that language educators may lackrelevant professional training (Chandee, 1997),
And may either (i) see language learning in terms of some - rather than all - aspects of
language ability;
(ii) may treat language ability and language proficiency as identical,believing that proficiency testing provides an accurate and reliablemethod of assessing communicative competence, and/or
(iii) perceive no essential difference between proficiency testing and arange of other assessment procedures. (Chandee, 1997),
It is essential, therefore, to begin here by acknowledging the
importance of teachers awareness of empirical research in thisarea. (Chandee, 1997),
-
8/3/2019 1Testing Scales of References
7/33
2.4.6 The Problem of Validity
in LanguageT
esting To be valid, a test must measure what it sets out
to measure.
For example, if listening and writing skills are to
be tested then
the test items must involve listening and writing
which may be in the form of, as Anastasi suggests,
listening to lectures and
writing reports and
both must contain authentic materials
(Anastasi, 1961, p. 138).
-
8/3/2019 1Testing Scales of References
8/33
Accordingly, Anastasi's definition of
content validity is
the systematic examination of the test content
to determine whether it covers a
representative sample of the behaviour
domain to be measured. This representative sample of the
behaviour domain must closely reflect that
domain in performance terms
(Anastasi, 1976, p. 134-135).
-
8/3/2019 1Testing Scales of References
9/33
Many language test researchers have
noted the inadequacy of face validity,
content relevance, and
predictive utility of language tests (Alderson, 1981; Bachman 1988; Bachman and
Savignon 1986; Skehan, 1984; Stevenson, 1981,
1985; Upshur, 1979).
-
8/3/2019 1Testing Scales of References
10/33
This poses problems for predictive validity,
as, for example, Bachman (1990) notes,
an examination of predictive utility alone can
largely ignore the question of what abilitiesare being measured
(p. 250-251).
-
8/3/2019 1Testing Scales of References
11/33
The problem becomes evident with the
use of, for example, multiple-choice
grammar tests
to measure an individuals writing ability
or for placing the individual in a writing
course (Bachman, 1990, p. 250-251).
-
8/3/2019 1Testing Scales of References
12/33
Moreover, the conditions that determine the
meanings of a speech act are complex and,
for the test to be valid, test writers must take this intoconsideration. (Chandee, 1997)
This is highlighted in Spolsky's (1986) comment
that
we can study the pragmatic value and sociolinguisticprobability of choosing...structures in different
environments...but the complexity is such that we
cannot expect ever to come up with anything like a
complete list from which sampling is possible (p. 150).
-
8/3/2019 1Testing Scales of References
13/33
2.4.7 Authenticity of
Communicative LanguageT
ests It is problematic to define the term
authenticity
in terms of samples of 'real-life' language use
since language use depends on different contexts,
purposes,
topics,
participants,
speech events,
and so forth
(Bachman, 1990, p. 690; Morrow, 1991, p. 114;Nunan, 1988, p. 99; Widdowson, 1990, pp. 44-47).
-
8/3/2019 1Testing Scales of References
14/33
Chandee 1997
Any testing situation is, therefore, unnatural and
thus not authentic.
Language use in real life varies according to
speakers' linguistic and communicative competences,
the contexts the language is used in,
speakers and listeners' background knowledge and
the cultural aspects both speakers and listeners bring
with them.
This makes it difficult to distinguishing 'real-life'
from 'nonreal-life' language use.
-
8/3/2019 1Testing Scales of References
15/33
To make a test authentic, it must, inevitably, be one that reproduces a real-life
situation in order to examine the students ability to cope with it (Doy, 1991, p. 105)
and must measure the interaction between thelanguage user and the discourse (Widdowson
197, p. 80)
Moreover, pragmatic criteria must be present.
That is, language tests...must require the learnerto understand the pragmatic interrelationship oflinguistic context and extralinguistic contexts (Oller, 1979, p. 33).
-
8/3/2019 1Testing Scales of References
16/33
This sort of authenticity is difficult to
achieve in a test situation
where both the tester and the test taker know
that the only purpose of the interaction is to
obtain an assessment of the test taker's
language performance (Shohamy & Reves, 1985, p. 55).
-
8/3/2019 1Testing Scales of References
17/33
Spolsky (1985) supports this view,
maintaining that
however hard the tester might try to
distinguish his purpose,
it is not to engage in genuine conversation with the
candidate. . . but rather to find out something aboutthe candidate in order to classify, reward, or punish
him/her (p. 36).
-
8/3/2019 1Testing Scales of References
18/33
Authenticity is, therefore, almost unachievablesince, according to (Klein-Braley, 1985),
if authenticity means real-life behaviour, thenany language testing procedure is non-authentic (p. 76).
We are forced, therefore, with Spolsky, toconclude that testing is not authentic languagebehaviour, that examination questions are not real, however
much like real-life questions they seem(p. 36).
Furthermore, an examinee needs to learn thespecial rules of examinations before he or shecan take part in them successfully (Spolsky,1985, p. 36).
-
8/3/2019 1Testing Scales of References
19/33
Though tests are, in general, inevitably notauthentic in the full sense,
it should be possible to establish criteria which willapproximate authenticity (Chandee, 1997).
Testing methods need, for example, to bemodified so that they do not impinge on thelanguage use observed (Chandee, 1997).
and, as both Spolsky (1985) and Shohamy andReves (1985) observe, the unobtrusive observation of language use in
'natural situations' is one way of achieving at least apartial solution to the question of authenticity
(Shohamy & Reves, 1985, p. 55; Spolsky, 1985, p. 39).
-
8/3/2019 1Testing Scales of References
20/33
Chandee, 1997
Some theorists suggest that one authentic anddirect testing situation is to observe an individualover a period of time (Jones, 1985, p. 81).
The main problem, of course, with extensivenaturalistic observation of non-test language useis that it is impractical,
time-consuming, cumbersome and
expensive, and hence not feasible in most language testing situations.
-
8/3/2019 1Testing Scales of References
21/33
Chandee, 1997
It is certainly impossible in a country which
does not use the target language in every
day life situations.
-
8/3/2019 1Testing Scales of References
22/33
A different, but perhaps equally important
problem pointed out by is the serious
ethical question raised by using
information obtained surreptitiously,
without individuals' knowledge, for making
decisions about them. Spolsky (1989),
-
8/3/2019 1Testing Scales of References
23/33
Subjects who for various reasons do not test well (who become over-anxious, or
who are unwilling to play the special game of testing,
i.e. answering a question the answer to which is known better by theasker than the answerer)
will not be accurately measured by any kind of formal test:
there will be a large gap between their test and their real-lifeperformance
(Spolsky, p. 74).
This lack of authenticity in the material used in a testraises issues about the generalizability of results(Spolsky, 1985, p. 39).
-
8/3/2019 1Testing Scales of References
24/33
To solve the dilemma of test authenticity,
it might be possible to argue that languagetests have an authenticity of their own
(Chandee, 1997),
authentic tasks are in principle impossible
in a language testing situation,
and communicative language testing is in
principle impossible"
(Alderson (1981a) suggests p. 48).
-
8/3/2019 1Testing Scales of References
25/33
The problem of authenticity might be resolved byaccepting Widdowsons (1978) definition of authenticityas a characteristic of the relationship between the passage and the
reader [that] has to do with appropriate response (p. 80). This notion of authenticity is very similar to Oller's (1979)
description of a 'pragmatic' test, that is, any procedure or task that causes the learner to process
sequences of elements in a language
that conform to the normal contextual constraints of that language, and which requires the learner to relate sequences of linguistic
elements via pragmatic mapping to extralinguistic context (p. 38).
-
8/3/2019 1Testing Scales of References
26/33
2.4.8 Constructing
Language Proficiency Tests
Pimporn Chandee
1997
-
8/3/2019 1Testing Scales of References
27/33
2.4.8 Constructing Language
Proficiency Tests When all of the problems of test authenticity are taken
into account, it is clear that it is very difficult to constructa test that will be authentic (Chandee, 1997).
Even so, even if the focus is on only one or a fewcomponents of language ability in a given testingcontext, Bachman (1990) notes that there is a need to be aware of the full
range of language abilities when designing,
developing and interpreting language test scores(p. 682).
and that design must be informed by abroader view of languageability (p. 682).
-
8/3/2019 1Testing Scales of References
28/33
This view mirrors those of Spolsky (1989)who suggests that
test authenticity may be achieved if all the distinguishing characteristics or features
within a finite open set, consisting of a potentially infinite number of instances are
used in test constructions" (p. 74).
However, this may be impractical(Chandee, 1997).
-
8/3/2019 1Testing Scales of References
29/33
Chandee, 1997
Problems in creating good tests of
language ability are unavoidable
since language tests can be used only as an
indirect way of making inferences about a test
taker's language ability.
-
8/3/2019 1Testing Scales of References
30/33
Since language use involves the
integration of multiple components andprocesses,
it is unlikely that there will ever be a language
test that will measure all the components of
language ability or even a test (Chandee), in
Bachman's (1990) terms, that will elicit
language test performance that is characteristic of language performance
in non-test situations (p. 19).
-
8/3/2019 1Testing Scales of References
31/33
To be similar to 'normal', or 'real-life' and
'nontest' language use, test tasks essentially must include the followingelements:
'pragmatic'
(Oller 1979, pp. 16-19, p. 27 and p. 33; 1991, p. 32; Spolsky,1986, p. 150),
'functional'
(Bachman, 1990, p. 301),
'communicative'
(Bachman, 1990, p. 301; Canale & Swain, 1980, p. 31),
'performance'
(Bachman, 1990, p. 301) and
'authenticity'
(Bachman, 1990, p. 301; Morrow, 1991, p. 112, p. 114;Spolsky, 1989, p. 74).
-
8/3/2019 1Testing Scales of References
32/33
Every instance of authentic language use involvesseveral abilities.
For example, for taxi drivers to operate in the
international airport in Bangkok, they need to know notonly the conversational discourse such as a request by the customer to be taken to a particular place,
an agreement by the driver to take the customer, or
a request for directions followed by an agreement, and
finally a statement of the fare by the driver, and
a polite thank you upon receipt of the fare
but also
how to converse with the customer in the following situations
the fare as a point of bargaining,
the fare depending on the weather, the time of day or night, thecondition of the streets, traffic and so on
(Bachman, 1990, p. 312).
-
8/3/2019 1Testing Scales of References
33/33
Chandee, 1997
Hence, Bachman points out, there is probably an infinite variety of conversational exchanges
that might take place between the taxi drivers and thecustomers (p. 312).
Furthermore, the very nature of language use is suchthat discourse consists of interrelated illocutionary acts expressed in
a variety of related forms.
If language test scores are to reflect several abilities, and if authentic test tasks are, by definition, interrelated,
then measurement models must be appropriate for analysing andinterpreting these abilities.