bilc conference may 2010 istanbul, turkey dr. elvira swender, actfl

A Tale of Two TestsA Tale of Two Tests

STANAG and CEFRSTANAG and CEFRComparing the Results of side-by-side Comparing the Results of side-by-side

testing of reading proficiencytesting of reading proficiency

BILC ConferenceBILC ConferenceMay 2010May 2010

Istanbul, TurkeyIstanbul, Turkey

Dr. Elvira Swender, ACTFLDr. Elvira Swender, ACTFL

With apologies to the authorWith apologies to the author

With apologies to the authorWith apologies to the author

We had a “Dickens of a time” with this study.

OverviewOverview Two systems: STANAG and CEFRTwo systems: STANAG and CEFR Two tests of reading proficiencyTwo tests of reading proficiency

BAT-Reading BAT-Reading Leipzig Test of Reading Proficiency (LTRP)Leipzig Test of Reading Proficiency (LTRP)

The side-by-side study The side-by-side study ObservationsObservations QuestionsQuestions

Two SystemsTwo Systems

Why is there a need to relate Why is there a need to relate STANAG and CEFR?STANAG and CEFR?

To recognize linguistic abilities of military To recognize linguistic abilities of military personnel in civilian societypersonnel in civilian society

To provide a framework to military institutions in To provide a framework to military institutions in nation states operating STANAG qualifications nation states operating STANAG qualifications who need to equate them with CEFR for the who need to equate them with CEFR for the purpose of gaining civilian recognition of military purpose of gaining civilian recognition of military qualificationsqualifications

To provide guidance to employers, trainers, non-To provide guidance to employers, trainers, non-language experts on how to interpret/evaluate language experts on how to interpret/evaluate CEFR qualifications CEFR qualifications

To identify competence gaps thereby determine To identify competence gaps thereby determine whether an individual is capable of undertaking a whether an individual is capable of undertaking a job requiring a given SLPjob requiring a given SLP

To allow informed decisions to be made on To allow informed decisions to be made on appropriate linguistic competenceappropriate linguistic competence

““Birds of a Feather”Birds of a Feather”

Broad Questions?Broad Questions?

Can the two systems be compared?Can the two systems be compared? Are the two systems related?Are the two systems related? Can the two systems be aligned?Can the two systems be aligned? Can the two systems be equated?Can the two systems be equated?

Comparing CEFR and STANAGComparing CEFR and STANAGSimilaritiesSimilarities

FeatureFeature CEFRCEFRSTANAGSTANAG Describe language abilities

on a scale from little or no ability to that of a highly articulate speaker

A1, A2, B1, B2, C1, C2

0+, 1, 1+, 2, 2+, 3, 3+, 4, 4+, 5

Criterion referenced

Address speaking, listening, reading, and writing

Describe tasks (functions), contexts, and expectations for accuracy

Contain can-do statements

All criteria, some of the time

All criteria, all of the time

A Summary of the Major A Summary of the Major ContrastsContrasts

CEFRCEFR STANAGSTANAG The primary purpose is to The primary purpose is to check learners’ progress in check learners’ progress in developing communicative developing communicative competence within a specific competence within a specific course of study.course of study.

The primary purpose is to test The primary purpose is to test individuals’ general proficiency individuals’ general proficiency across a wide range of topics across a wide range of topics regardless of their course of regardless of their course of study.study.

The primary users of the The primary users of the information are the teachers information are the teachers and students. and students.

The primary users of the The primary users of the information are teachers and information are teachers and administrators, employers.administrators, employers.

By design, the By design, the CEFR CEFR is is under-specified for testing of under-specified for testing of general, real-world general, real-world proficiency.proficiency.

By design, By design, STANAGSTANAG is under- is under-specified for measuring step-specified for measuring step-by-step progress within a by-step progress within a specific curriculum.specific curriculum.

About this StudyAbout this Study

University of LeipzigUniversity of Leipzig April 19-23, 2010April 19-23, 2010 Proctored on-line tests in computer labProctored on-line tests in computer lab Goal was to involve five groups with 20 participants Goal was to involve five groups with 20 participants

eacheach Levels A1, A2, B1, B2, C1 according to course enrolledLevels A1, A2, B1, B2, C1 according to course enrolled

Split test designSplit test design half of the participants in each group took the BAT-R test half of the participants in each group took the BAT-R test

first, the other half took the RPT-E firstfirst, the other half took the RPT-E first Tests taken on different days Tests taken on different days

2 to 3 days apart depending on group 2 to 3 days apart depending on group 90 minutes per test90 minutes per test

Characteristics of ParticipantsCharacteristics of Participants GenderGender

Female: 65%; Male 35%Female: 65%; Male 35% AgeAge

Average 25 (Range: 19-63)Average 25 (Range: 19-63) First language First language

German (85%)German (85%) Arabic, Russian, Polish, Brazilian, Chinese, ThaiArabic, Russian, Polish, Brazilian, Chinese, Thai

Mean # of years of English study in school: Mean # of years of English study in school: German students 8.7 yearsGerman students 8.7 years Foreign students: 5.1 yearsForeign students: 5.1 years

Enrolled in 1 of 5 different levelsEnrolled in 1 of 5 different levels English Language Institute to English teacher traineesEnglish Language Institute to English teacher trainees

BAT Reading TestBAT Reading Test

Test of Test of English reading proficiency English reading proficiency Advisory scores for calibrating national Advisory scores for calibrating national

proficiency testsproficiency tests STANAG 6001 (version 3), Levels 1,2,3STANAG 6001 (version 3), Levels 1,2,3 Internet-delivered and computer scoredInternet-delivered and computer scored Developed by BILC Test Working GroupDeveloped by BILC Test Working Group Delivered by ACTFLDelivered by ACTFL

FormatFormat

Criterion-referenced testsCriterion-referenced tests Allow for direct application of the STANAG Allow for direct application of the STANAG

Proficiency ScaleProficiency Scale Texts and tasks are aligned by levelTexts and tasks are aligned by level Each proficiency level is tested separatelyEach proficiency level is tested separately

Test takers take all items for Levels 1,2,3Test takers take all items for Levels 1,2,3 20 texts at each level20 texts at each level One item with 4 multiple choice responses per One item with 4 multiple choice responses per

texttext

Scoring CriteriaScoring Criteria The proficiency rating is assigned based on two The proficiency rating is assigned based on two

separate scoresseparate scores ““Floor” – sustained ability across a range of tasks and Floor” – sustained ability across a range of tasks and

contexts specific to one levelcontexts specific to one level ““Ceiling” – non-sustained ability at the next higher Ceiling” – non-sustained ability at the next higher

proficiency levelproficiency level Must show “mastery” at a level to be assigned Must show “mastery” at a level to be assigned

that levelthat level Non-compensatory scoringNon-compensatory scoring

Performance at the next higher level provides Performance at the next higher level provides evidence of random, emerging, or developing evidence of random, emerging, or developing proficiency at the next higher level.proficiency at the next higher level. Developing proficiency at the next higher level indicates Developing proficiency at the next higher level indicates

a + rating.a + rating.

Leipzig Test of Reading Leipzig Test of Reading Proficiency Proficiency

Test of Test of English reading proficiency for English reading proficiency for entering and exiting students at entering and exiting students at universities in the state of universities in the state of Saxony/GermanySaxony/Germany

To determine proficiency levels from A1 To determine proficiency levels from A1 to C1 according to the CEFRto C1 according to the CEFR

For placement and certification purposesFor placement and certification purposes Entrance and exit requirements in all subjectsEntrance and exit requirements in all subjects

Developed by the University of Leipzig Developed by the University of Leipzig under a grant from the state of Saxonyunder a grant from the state of Saxony

FormatFormat 5 texts with 3 questions each per level5 texts with 3 questions each per level

15 items per level15 items per level Multiple choice questions Multiple choice questions

one correct answer and three distractersone correct answer and three distracters Entire Series of testsEntire Series of tests

Combine 2 or 3 adjoining levelsCombine 2 or 3 adjoining levels A1-B1 or B1-B2 or B1-C1A1-B1 or B1-B2 or B1-C1

Version of the test used in this studyVersion of the test used in this study B1-C1B1-C1

Level A1Level A1 5 texts: 60-100 words each5 texts: 60-100 words each Major tasks and functionsMajor tasks and functions

Topic recognition and comprehension of simple single Topic recognition and comprehension of simple single factsfacts

ContentContent Basic personal and social needs Basic personal and social needs

Text typeText type Very short, simple straight-forward texts: notes, post Very short, simple straight-forward texts: notes, post

cards, simple instructions and directionscards, simple instructions and directions 3 MC questions per text3 MC questions per text

Global, selective, detailGlobal, selective, detail

Screen shot of A1 itemScreen shot of A1 item

to come (requestedfrom Helen)to come (requestedfrom Helen)

Level C1Level C1 5 texts: 200-300 words each5 texts: 200-300 words each Major tasks and functionsMajor tasks and functions

Complex information processing including inferences, Complex information processing including inferences, hypotheses, and nuanceshypotheses, and nuances

ContentContent Academic, professional, and literary materialAcademic, professional, and literary material

Text typeText type Op/ed pieces, analyses and commentaries, detailed Op/ed pieces, analyses and commentaries, detailed

technical reports, literary texts technical reports, literary texts 3 MC questions per text3 MC questions per text

global, detail, inferenceglobal, detail, inference

Scoring CriteriaScoring Criteria

Total number of pointsTotal number of points Rate highest levels that have a Rate highest levels that have a

combined total of at least 18 points combined total of at least 18 points with the lower level with at least 11 with the lower level with at least 11 points (70%)points (70%)

18-24 points (60-80%) = lower level18-24 points (60-80%) = lower level 25-30 points (81-100%) = higher 25-30 points (81-100%) = higher

levellevel

FindingsFindings

A1A1 A2A2 B1B1 B2B2 C1C1 TOTALTOTAL

00 11 11

11 22 44 11 77

1+1+ 44 66 1010

22 11 1616 66 33 2626

2+2+ 66 11 77

33 55 1010 1515

TOTALTOTAL 33 99 2323 1717 1414 6666

Scatter Plot of Total Raw Scatter Plot of Total Raw ScoresScores

LTRP Total Score

BA

T-R

T

ota

l Sco

re

(Correlation of Total Raw Scores r = .905, p < .001)

With the current data, one could With the current data, one could saysay

At the lowest and highest ends of the At the lowest and highest ends of the scales there is alignmentscales there is alignment No one who was rated 1 was also rated No one who was rated 1 was also rated

B2 or C1B2 or C1 No one who was rated 3 was rated A1, No one who was rated 3 was rated A1,

A2, or B1. A2, or B1. The middle ranges are where there is The middle ranges are where there is

the least amount of alignmentthe least amount of alignment A BAT-R 2 can be anything from A2 to C1 A BAT-R 2 can be anything from A2 to C1

A1A1 A2A2 B1B1 B2B2 C1C1 TOTALTOTAL

00 11 11

11 22 44 11 77

1+1+ 44 66 1010

22 11 1616 66 33 2626

2+2+ 66 11 77

33 55 1010 1515

TOTALTOTAL 33 99 2323 1717 1414 6666

BAT-R BAT-R LTRPLTRP

0 0 0 or A10 or A1

11 A1 or A2, A1 or A2, (Mostly A2)(Mostly A2)

1+ 1+ A2 or B1 (Mostly B1)A2 or B1 (Mostly B1)

22 A2, B1, B2, or C1 (Mostly B1)A2, B1, B2, or C1 (Mostly B1)

2+2+ B2 or C1 (Mostly B2)B2 or C1 (Mostly B2)

3 3 B2 or C1 (Mostly C1)B2 or C1 (Mostly C1)

With the current data, one could With the current data, one could saysay

LTRPLTRP BAT-R BAT-R

A1A1 0 or 1 (Mostly 1)0 or 1 (Mostly 1)

A2A2 1, 1+ or 2 (Mostly 1)1, 1+ or 2 (Mostly 1)

B1B1 1+ or 2 (Mostly 2)1+ or 2 (Mostly 2)

B2B2 2, 2+ or 3 (Mostly 2)2, 2+ or 3 (Mostly 2)

C1C1 2, 2+ or 3 (Mostly 3)2, 2+ or 3 (Mostly 3)

With the current data, one With the current data, one could saycould say

Estimated ProbabilityEstimated ProbabilityEstimated Probability of a BAT-R Rating Estimated Probability of a BAT-R Rating

Based on LTRP RatingBased on LTRP Rating

BAT-R RatingBAT-R Rating

LTRP LTRP RatingRating

00 11 1+1+ 22 2+2+ 33

00 0.930.93 0.070.07 .. .. .. ..

A1A1 0.300.30 0.670.67 0.030.03 .. .. ..

A2A2 0.010.01 0.490.49 0.400.40 0.090.09 .. ..

B1B1 .. 0.030.03 0.210.21 0.740.74 0.010.01 0.010.01

B2B2 .. .. 0.010.01 0.570.57 0.230.23 0.180.18

C1C1 .. .. .. 0.040.04 0.080.08 0.880.88

Shaded values are highest probability on the row.Shaded values are highest probability on the row.

What is the probability?What is the probability?

That a BAT-R 2 is also a LTRP:That a BAT-R 2 is also a LTRP:

A2A2 9% 9% B1B1 74%74% B2B2 57%57% C1C1 5%5%


That a BAT-R 3 is also an LTRP:That a BAT-R 3 is also an LTRP:

B1B1 9%9% B2B2 18%18% C1C1 88%88%


That a LTRP B1 is also a BAT-R:That a LTRP B1 is also a BAT-R:

11 3%3% 1+1+ 21%21% 22 74%74% 2+2+ 1%1% 33 1%1%


That a LTRP B2 is also a BAT-R:That a LTRP B2 is also a BAT-R:

1+1+ 1%1% 22 57%57% 2+2+ 23%23% 33 18%18%

Answering the Broad Answering the Broad QuestionsQuestions

Can the two systems be compared?

YES

Are the two systems related?

YES

Can the two systems be

aligned?

Somewhat

Can the two systems be equated?

Probably not

““Heat Chart”Heat Chart”

CEFR

STANAG 6001

When comparing testing When comparing testing systems systems

Ask about the purpose of the testAsk about the purpose of the test Placement, progress, prove a level, etc.Placement, progress, prove a level, etc.

Ask about what the test is testing Ask about what the test is testing Is it a test of achievement, performance, proficiency? Is it a test of achievement, performance, proficiency? Does it test spontaneous abilities or rehearsed Does it test spontaneous abilities or rehearsed

performance? performance? Ask about how the test scores are determinedAsk about how the test scores are determined

Non-compensatoryNon-compensatory prove a floor and ceiling prove a floor and ceiling

Total pointsTotal points Ask if research existsAsk if research exists

Answers from a CEFR ExpertAnswers from a CEFR Expert

CEFR is not one system. It is NOT intended to be used to transfer scores from CEFR is not one system. It is NOT intended to be used to transfer scores from one country to the next or from one language to another but rather to set a one country to the next or from one language to another but rather to set a framework within which educators can build curricula. framework within which educators can build curricula.

Not a harmonisation projectNot a harmonisation project

Alignment is problematic because we do not know what we are aligning. Not a Alignment is problematic because we do not know what we are aligning. Not a matter of alignment or equivalency but a matter of relationship matter of alignment or equivalency but a matter of relationship

The scale is an origin for comparison. The scale functions as exemplars and The scale is an origin for comparison. The scale functions as exemplars and activities. The scale is a meta-framework for learning and teaching.activities. The scale is a meta-framework for learning and teaching.

Conversation with Nick Saville, Conversation with Nick Saville, Cambridge, EnglandCambridge, EnglandApril 15, 2010April 15, 2010

In ClosingIn Closing

It is a far, far better It is a far, far better thing that we do than thing that we do than we have ever donewe have ever done

to know how to use to know how to use test scores.test scores.

Questions?Questions?

Contact: Contact: [email protected]

mailto:[email protected]

Extra slidesExtra slides

Crosstabulation of Test ResultsCrosstabulation of Test Results

bilc conference may 2010 istanbul, turkey dr. elvira swender, actfl

Documents

cefr qualifications

stanag qualifications

primary purpose

years of english study

specific course of study

major contrasts cefr

realworld proficiency

language abilities