clare 1 how ‘good’ are our speaking test tasks: implications of recent research findings barry...

CLAReCLARe 1

How ‘good’ are our speaking test tasks:

implications of recent research findings

Barry O’Sullivan

Centre for Language Assessment Research (CLARe)Roehampton University

CLAReCLARe 2

Focus of this talk

Outline the basic premise of the paper

Discuss the implications for task-based testing & research

Present the findings of four research studies

CLAReCLARe 3

Focus 1 – O’Sullivan

Identified a series of variables likely to offer potential ‘affective’ reactions to interlocutors in ‘direct’ test tasks

Explored impact on performance in a ‘direct’ test of a series of variables:

1. age; language level; personality; sex – of test taker and of interlocutor

2. acquaintanceship

Also explored impact of topic and gender in an ‘indirect’ test task

CLAReCLARe 4

Focus 1 – Results

Found a significant effect in each study which focused on a single variable

Significant interactions involving all variables explored. Tendency for complex, often three-way, interactions

Significant (though small) effects found for ‘indirect’ task where question on male oriented topic delivered by male speaker

CLAReCLARe 5

Focus 2 – Weir & Wu

Looked at the parallel-form equivalence of 3 alternate forms of a semi-direct oral proficiency test which was comprised of 3 tasks

Argue that various kinds of evidence are needed to ensure true equivalence

Present quantitative and qualitative evidence of equivalence

CLAReCLARe 6

Focus 2 – Results

Found that different forms of test tasks can be shown to be equivalent from the quantitative perspectiveDemonstrated how qualitative evidence (rater judgements) can support or reject the claims made from the quantitative evidence

“The results show that without taking the necessary steps to control context variables affecting test difficulty, test quality may fluctuate over tasks in different test forms.” Weir & Wu (2006: 192)

CLAReCLARe 7

Focus 3 – O’Sullivan, Weir & Horai

Explored the impact on task performance of three variables (planning time; planning condition; response time)Suggest a methodology for ensuring the true equivalence of test tasks

Focused on the individual long turn task

CLAReCLARe 8

Focus 3 – Evidence of Equivalence

Examined using checklist (based on Skehan 1996)

Identified 9 task versions

Reduced to 8 tasks

Quantitative: Reduced to 4

tasksQualitative: Confirmed 4

tasks

Pilot studies with learners

Trial with 54 learners

CLAReCLARe 9

Focus 4 – Horai

Followed on from the study reported in Focus 3 to include proficiency level as an intervening variableFound significant differences in performance and in cognitive processing for the four different tasks

Supports the argument that task difficulty rests not in the task but in the interaction between the task and the ability within the individual (i.e. Context & Cognitive Validity)

CLAReCLARe 10

Observations

Focus 1 learners’ affective reaction to their interlocutor (peer or examiner) can systematically impact on performance

Focus 2 it is possible to generate truly equivalent speaking tests, but that there may be differences at the task level

Focus 3 task equivalence can only be claimed where both quantitative and qualitative evidence is established

Focus 4 task difficulty is not a constant (as is presumed in much assessment work) but changes with the level of the test taker

CLAReCLARe 11

Implications for Task-Based Testing

1 The results from Study 1 + the ‘negotiation of discourse’ argue against using interactive tasks in test events2 The results of Studies 2 & 3 suggest that true alternate test and task forms are possible for monologic formats

4 The results of all four imply that it may not be possible to develop truly equivalent versions of interactive test tasks

3 The results from Study 4 imply that group level comparisons based on task performance may be unstable

CLAReCLARe 12

Implications for TBLT Research

Have researchers taken either

affect or equivalence into

account?

O’Sullivan (2000) & Wu (2005) present review tables than suggest the answer is NO

Should they?

YES

When in particular?

When their research is reliant on using two or more ‘similar’ tasksWhen they are exploring the language of interaction

CLAReCLARe 13

References Horai, Tomoko. forthcoming. Intra Task Comparison in monologic tasks in L2 Speaking Testing. PhD dissertation, Roehampton University.

Lumley, Tom & O’Sullivan, Barry. 2006 The Impact of Test Taker Characteristics on Speaking Test Task Performance. Language Testing, 22 (4): 415–437.

O’Sullivan, Barry. 2000. Exploring Gender and Oral Proficiency Interview Performance. System, 28 (3): 373-386.

O’Sullivan, Barry. 2002. Learner Acquaintanceship and Oral Proficiency Test Pair-Task Performance. Language Testing, 19 (3): 277-295.

O’Sullivan, Barry. forthcoming. Modelling Performance in Oral Language Testing. Frankfurt: Peter Lang. Based on PhD dissertation from the University of Reading (2000).

O’Sullivan, Barry, Weir, Cyril & Horai, Tomoko. 2004. Exploring difficulty in speaking tasks: an intra-task perspective. ESOL/The British Council/ IDA Australia: IELTS Research Report.

Weir, Cyril & Wu, Jessica. 2006. Establishing Test Form and Individual Task Comparability – A Case Study of the GEPT Intermediate Spoken Performance Test. Language Testing, 23 (2): 167–197.

Weir, Cyril. 2004. Language Testing and Validity: an evidence-based approach. Oxford: Palgrave

Wu, Jessica. 2005. Task difficulty in semi-direct speaking tests. Unpublished PhD dissertation. Roehampton University.

CLAReCLARe 14

CONTACT

Dr Barry O’SullivanDirectorCentre for Language Assessment Research (CLARe)Digby Stuart CollegeRoehampton UniversityRoehampton LaneLondonSW15 5PUUnited Kingdom

Tel: +44 (0)20 8392 3348Fax: +44 (0)20-8392-3031

[email protected]

clare 1 how ‘good’ are our speaking test tasks: implications of recent research findings barry...

Documents