listening tests: past, present and future john field, crella, university of bedfordshire

41
Listening tests: past, present and future John Field, CRELLA, University of Bedfordshire Language Testing Forum 2013, Nottingham

Upload: maxine-oneill

Post on 02-Jan-2016

23 views

Category:

Documents


4 download

DESCRIPTION

Listening tests: past, present and future John Field, CRELLA, University of Bedfordshire. Language Testing Forum 2013, Nottingham. A problematic skill. Difficult to test because it is an extremely individual operation in terms of both listener and input. - PowerPoint PPT Presentation

TRANSCRIPT

Listening tests: past, present and future

John Field, CRELLA, University of Bedfordshire

Language Testing Forum 2013, Nottingham

University of Bedfordshire 2

A problematic skill

Difficult to test because it is an extremely individual operation in terms of both listener and input.

• Internalised. Takes place in the mind of the test taker.

• Highly variable signal. Variable at the levels of phoneme –word – speaker.

The value of a cognitive approach

• It sheds light on what goes on in the mind of the test taker.

• We need to know whether high-stakes test actually test what they claim to test. Can a listening test, for example, accurately predict the ability of a test taker to study at an English medium university?

• At local level, we need to use tests to diagnose learner problems so that the tests can feed into learning. This is especially true of listening.

Cognitive validation asks…

• Does a test elicit from test takers the kind of process that they would use in a real-world context? In the case of listening, are we testing the kinds of process that listeners would actually use ?

• Or do the recordings and formats that we use lead test takers to behave differently from the way they would in real life?

Phases of listening (Field 2008, 2013)

Speech signal

Words

Meaning

5

Input decoding

Lexical search

Parsing

Meaning construction

Discourse construction

Issues of cognitive validity

• A. To what extent do the processes elicited by a test resemble real-world processes?

• B. To what extent are the processes elicited by a test comprehensive enough to represent the range of processes that make up a skill?

• C. Are the processes finely enough calibrated to reflect what a listener is capable of at the target level?

University of Bedfordshire 6

The ghost of listening past:

1913-1974

University of Bedfordshire 7

Word identification

Tick the word you hear the examiner say:

[ ] hide [ ] heard [ ] hard [ ] hoard

Test taker hears: I heard her telling him

Test taker chooses: A heard B hurt C hot D hotel

Test taker hears: It’s hot all day long

Test taker chooses: A heard B hurt C hot D hotel

[Lower Certificate in English, 1972, quoted Weir 2013]

[ University of Bedfordshire 8

A cognitive perspective

• Only taps into lowest two levels of processing (phoneme recognition – lexical FORM)

• Role of the phoneme as a perceptual unit has been much questioned. Processing is now viewed as taking place at multiple levels (including top-down word level matches that overrule phoneme level information: the veshtable effect)

• And yet: We still use items based on minimal pair phoneme perception in lower-level and YL tests:

The porter said that the train leaves at• A 9.15 B 9.50 C 5.15 D 5.50

University of Bedfordshire 9

Dictation

• Fear seized him / in the woods. / At one moment / it seemed to him / that enemy soldiers / were watching him / from behind the trees, / crawling out of the bushes./ He ran blindly, paying no attention / to the path / until he was out of breath.

Lower Certificate of English, June 1945 (quoted Weir, 2013)

• The passage will be read three times. During the first reading the candidates will write nothing down. It will be read a second time by groups of words, as divided by bars on the printed copy.… After each group, a pause will be made to allow the candidates to write it down. All essential punctuation will be given by the examiner

University of Bedfordshire 10

From a cognitive perspective

• A classic divided attention task (writing vs speaking). Conversion from one modality (speech) to another (writing)

• Little resemblance to any real-life listening task

• Natural processing (Jarvella, 1971) entails assembling words in order to parse them, then erasing them once they have been converted into a piece of information. Dictation requires test takers to hang on to words beyond the end of the phrase / clause.

• Encourages test takers to focus attention at word level. This reinforces a tendency among listeners at B1 and below to focus on discrete words rather than chunks.

University of Bedfordshire 11

And yet….

• Dictation takes the spoken word as its point of departure unlike today’s formats that rely heavily on written items

• Present-day tasks such as gap filling also entail divided attention effects.

• Dictation taps into lexical segmentation where the listener has to detect word boundaries in connected speech

• Conclusion. There might be value in including in lower level tests the transcription of clips of authentic speech. Such tests would show

• a) the ability to segment words in connected speech • b) whether test takers can process words in chunks rather

than just singly (a mark of progress towards B2 level)

University of Bedfordshire 12

The ghost of listening present:

‘comprehension’ in listening and reading

University of Bedfordshire 13

Listening test components

• Recording

• Recording as text

• Format

• Items

University of Bedfordshire 14

Recordings

Does the input impose similar listening demands to those of a real-world speaker?

Natural speech ( Recording Level B2)

• To what extent do these recordings resemble authentic everyday speech?

Some conclusions on studio recordings

• Actors adapt their delivery to fit punctuation.

• They pause regularly at the ends of clauses

• There are few hesitation pauses.

• No overlap between speakers

Solution: transcribe the speech as speech

• M1: the long lunch hour has been replaced by the quick snack + according to a new survey ++ most people take just 30 minutes to eat in the middle of the day + many of us don’t even leave our desks

• F: er I’m taking an hour today + but it’s normally sort of half an hour or 20 minutes.

• M2: I pop out for about ten minutes + get something to eat + and then go back to my desk

• M1: a survey at the start of the year + found that only one per cent of people in Britain + regularly take a full 60 minute break + this is very different from forty years ago + when offices everywhere stopped work at one o’clock + people went out to lunch + and didn’t return until two.

• [loosely based on BBC Radio 4 broadcast]

Solution: Specify speaker variables for item writers

• Accent

• Speech rate: speed and consistency

• Pausing

• Level and placing of focal stress

• Number of speakers

• Pitch of voice; familiarity of voice

• Precision of articulation

Recording-as-text

Is the recording content at an appropriate level for the expertise of the listener?

Format

Does the task elicit processes which resemble those that a listener would use in a real-world

listening event?

Recording

You hear a man and a woman talking about going to the gym. What does the man say about going to the gym?

A. It is too expensive for him

B. It takes too much of his time.

C. It is too physically demanding

(FCE Handbook, 2008: 68)

Recording as text

Woman: So that didn’t last long, did it? Two weeks going to the gym and you’re already talking about giving it up…

Man: Look, if you’re saying I’m not up to it, you’re wrong. I realise it’s very effective in working every muscle, and when I get started, it’s just like other sports. I don’t even mind feeling exhausted at the end. But, listen, you sort out your kit at home, lug it to the gym, queue to pay your entrance fee, then change and queue for the machines … when you could have been for a run straight from your home and then been free to get on with your life.

Woman: Well, I think you’re wrong and you should make the effort to carry on.

Recording as text 2

Woman: So that didn’t last long, did it? Two weeks going to the gym and you’re already talking about giving it up…

Man: Look, if you’re saying I’m not up to it, you’re wrong. I realise it’s very effective in working every muscle, and when I get started, it’s just like other sports. I don’t even mind feeling exhausted at the end. But, listen, you sort out your kit at home, lug it to the gym, queue to pay your entrance fee, then change and queue for the machines … when you could have been for a run straight from your home and then been free to get on with your life.

Woman: Well, I think you’re wrong and you should make the effort to carry on.

Recording as text 2

Woman: So that didn’t last long, did it? Two weeks going to the gym and you’re already talking about giving it up…

Man: Look, if you’re saying I’m not up to it, you’re wrong. I realise it’s very effective in working every muscle, and when I get started, it’s just like other sports. I don’t even mind feeling exhausted at the end. But, listen, you sort out your kit at home, lug it to the gym, queue to pay your entrance fee,

then change and queue for the machines … when you could have

been for a run straight from your home and then been free to get on with your life.

Woman: Well, I think you’re wrong and you should make the effort to carry on.

Recording as text

• Test setters tend to base their tests on a written script which has not yet been recorded.

• The linguistic criteria they employ rely heavily on lexical frequency and syntactic simplicity.

• BUT in processing terms difficulty is often caused by:

• a. the density of ideas and the complexity of the links between them

• b. perceptual saliency of phrases and clauses University of Bedfordshire 25

Recording difficulty: cognitive criteria

• How frequent is the vocabulary?

• How complex is the grammar?

• How familiar is the topic?

• How long is the recording?

• How dense are the idea units in the recording?

• How complex are the connections between idea units?

• How clearly structured is the overall line of argument?

• How concrete or abstract are the points made?

Using conventional tasks• Provide items after a first playing of the recording and before

a second. This ensures more natural listening, without preconceptions or advance information other than general context.

• Keep items short. Loading difficulty on to items (especially MCQ ones) just biases the test in favour of reading rather than listening.

• Favour tasks (e.g. multiple matching) that allow items to ignore the order of the recording and to focus on global meaning rather than local detail.

Items

Do the items target a sufficiently wide range of levels of processing?

September 2006University of Bedfordshire 28

Five phases of listening (Field 2008)

Speech signal

Words

Meaning

29

Decoding

Word search

Parsing

Meaning construction

Discourse construction

Targets

An item in a test can target any of these levels:

• Decoding:

She caught the (a) 9.15 (b) 9.50 (c) 5.15 (d) 5.50 train.

• Lexical search:

She went to London by …….

• Factual information:

Where did she go and how?

• Meaning construction:

Was she keen on going by train?

• Discourse construction.

What two reasons did she give for going by train?

Targeting levels of listening

Test takers at proficiency level B1 and below focus heavily on word recognition and have problems in processing language in chunks.. In these tests, it may be desirable to focus items mainly on the first three areas

Higher- level tests should particularly target meaning representation and discourse representation.

Information handling

But they don’t.

Reason 1: Item writers tend to focus on discrete points of information. They do not target the connections between them. In real life, the listener has to build an information structure.

Reason 2: It is the item writer who decides what is/is not important in a recording. In the real world, the listener has to identify major and minor points and ignore irrelevant points

University of Bedfordshire 32

Structure building (Gernsbacher, 1990)

• Skilled listeners construct a hierarchical representation of a recording

Structure building

• Unskilled listeners focus their attention at local level.

• They build a linear structure.

A structure building taskThree types of pollution

1..…………………..

a. Example:………….

b. Solution:……………

2. …………………….

a. Cause: ………………..

b. Result: Climate change

3. …………………….

a. Result:……………………..

b. Solution: ……………………..

The inflexibility of high stakes testsLarge scale high-stakes tests Large scale high-stakes tests have major constraints which prevent them from testing listening in a way that fully represents the skill.

• Reliability and ease of marking

• Highly controlled test methods, using traditional formats that the candidate knows

• Little attention possible to individual variation or alternative answers

Advantages of more local tests and tasks

Smaller-scale testsSmaller-scale tests afford the possibility of testing a wider range of listening processes with:

• More open ended questions

• More scope for testing information handling

• Marking on an individual basis

• Possible acceptance of alternative answers

Computer delivery

• Computer delivered tests offer the possibility of:

• Controlling timing.

• Providing a first play of the recording before items become visible

• Monitoring responses to direct the test taker towards a particular level of difficulty

• Exploiting oral questions (including oral MCQ with short options)

University of Bedfordshire 38

An important issue for the future…

• Testers need to find a means of validating listening tests by means of evidence external to the test.

• This would entail establishing the listening proficiency of a listener by subjective assessment of performance.

• Methods might include ‘Listen and speak’ activities or the separate assessment of listening performance within a speaking test..

September 2006University of Bedfordshire 39

References

• Field, J. (2008) Listening in the Language Classroom. Cambridge: CUP

• Field, J. (2013) Cognitive validity. In Geranpayeh, A. & Taylor, L. (eds.) Examining Listening. Cambridge: Cambridge University Press

• Weir, C.J. (2013) Measured Constructs. Cambridge: Cambridge University Press

Thanks for listening

[email protected]

University of Bedfordshire 41