bilc conference may 2010 istanbul, turkey dr. elvira swender, actfl
DESCRIPTION
A Tale of Two Tests STANAG and CEFR Comparing the Results of side-by-side testing of reading proficiency. BILC Conference May 2010 Istanbul, Turkey Dr. Elvira Swender, ACTFL. With apologies to the author. With apologies to the author. We had a “Dickens of a time” with this study. - PowerPoint PPT PresentationTRANSCRIPT
A Tale of Two TestsA Tale of Two Tests
STANAG and CEFRSTANAG and CEFRComparing the Results of side-by-side Comparing the Results of side-by-side
testing of reading proficiencytesting of reading proficiency
BILC ConferenceBILC ConferenceMay 2010May 2010
Istanbul, TurkeyIstanbul, Turkey
Dr. Elvira Swender, ACTFLDr. Elvira Swender, ACTFL
With apologies to the authorWith apologies to the author
With apologies to the authorWith apologies to the author
We had a “Dickens of a time” with this study.
OverviewOverview Two systems: STANAG and CEFRTwo systems: STANAG and CEFR Two tests of reading proficiencyTwo tests of reading proficiency
BAT-Reading BAT-Reading Leipzig Test of Reading Proficiency (LTRP)Leipzig Test of Reading Proficiency (LTRP)
The side-by-side study The side-by-side study ObservationsObservations QuestionsQuestions
Two SystemsTwo Systems
Why is there a need to relate Why is there a need to relate STANAG and CEFR?STANAG and CEFR?
To recognize linguistic abilities of military To recognize linguistic abilities of military personnel in civilian societypersonnel in civilian society
To provide a framework to military institutions in To provide a framework to military institutions in nation states operating STANAG qualifications nation states operating STANAG qualifications who need to equate them with CEFR for the who need to equate them with CEFR for the purpose of gaining civilian recognition of military purpose of gaining civilian recognition of military qualificationsqualifications
To provide guidance to employers, trainers, non-To provide guidance to employers, trainers, non-language experts on how to interpret/evaluate language experts on how to interpret/evaluate CEFR qualifications CEFR qualifications
To identify competence gaps thereby determine To identify competence gaps thereby determine whether an individual is capable of undertaking a whether an individual is capable of undertaking a job requiring a given SLPjob requiring a given SLP
To allow informed decisions to be made on To allow informed decisions to be made on appropriate linguistic competenceappropriate linguistic competence
““Birds of a Feather”Birds of a Feather”
Broad Questions?Broad Questions?
Can the two systems be compared?Can the two systems be compared? Are the two systems related?Are the two systems related? Can the two systems be aligned?Can the two systems be aligned? Can the two systems be equated?Can the two systems be equated?
Comparing CEFR and STANAGComparing CEFR and STANAGSimilaritiesSimilarities
FeatureFeature CEFRCEFRSTANAGSTANAG Describe language abilities
on a scale from little or no ability to that of a highly articulate speaker
A1, A2, B1, B2, C1, C2
0+, 1, 1+, 2, 2+, 3, 3+, 4, 4+, 5
Criterion referenced
Address speaking, listening, reading, and writing
Describe tasks (functions), contexts, and expectations for accuracy
Contain can-do statements
All criteria, some of the time
All criteria, all of the time
A Summary of the Major A Summary of the Major ContrastsContrasts
CEFRCEFR STANAGSTANAG The primary purpose is to The primary purpose is to check learners’ progress in check learners’ progress in developing communicative developing communicative competence within a specific competence within a specific course of study.course of study.
The primary purpose is to test The primary purpose is to test individuals’ general proficiency individuals’ general proficiency across a wide range of topics across a wide range of topics regardless of their course of regardless of their course of study.study.
The primary users of the The primary users of the information are the teachers information are the teachers and students. and students.
The primary users of the The primary users of the information are teachers and information are teachers and administrators, employers.administrators, employers.
By design, the By design, the CEFR CEFR is is under-specified for testing of under-specified for testing of general, real-world general, real-world proficiency.proficiency.
By design, By design, STANAGSTANAG is under- is under-specified for measuring step-specified for measuring step-by-step progress within a by-step progress within a specific curriculum.specific curriculum.
About this StudyAbout this Study
University of LeipzigUniversity of Leipzig April 19-23, 2010April 19-23, 2010 Proctored on-line tests in computer labProctored on-line tests in computer lab Goal was to involve five groups with 20 participants Goal was to involve five groups with 20 participants
eacheach Levels A1, A2, B1, B2, C1 according to course enrolledLevels A1, A2, B1, B2, C1 according to course enrolled
Split test designSplit test design half of the participants in each group took the BAT-R test half of the participants in each group took the BAT-R test
first, the other half took the RPT-E firstfirst, the other half took the RPT-E first Tests taken on different days Tests taken on different days
2 to 3 days apart depending on group 2 to 3 days apart depending on group 90 minutes per test90 minutes per test
Characteristics of ParticipantsCharacteristics of Participants GenderGender
Female: 65%; Male 35%Female: 65%; Male 35% AgeAge
Average 25 (Range: 19-63)Average 25 (Range: 19-63) First language First language
German (85%)German (85%) Arabic, Russian, Polish, Brazilian, Chinese, ThaiArabic, Russian, Polish, Brazilian, Chinese, Thai
Mean # of years of English study in school: Mean # of years of English study in school: German students 8.7 yearsGerman students 8.7 years Foreign students: 5.1 yearsForeign students: 5.1 years
Enrolled in 1 of 5 different levelsEnrolled in 1 of 5 different levels English Language Institute to English teacher traineesEnglish Language Institute to English teacher trainees
BAT Reading TestBAT Reading Test
Test of Test of English reading proficiency English reading proficiency Advisory scores for calibrating national Advisory scores for calibrating national
proficiency testsproficiency tests STANAG 6001 (version 3), Levels 1,2,3STANAG 6001 (version 3), Levels 1,2,3 Internet-delivered and computer scoredInternet-delivered and computer scored Developed by BILC Test Working GroupDeveloped by BILC Test Working Group Delivered by ACTFLDelivered by ACTFL
FormatFormat
Criterion-referenced testsCriterion-referenced tests Allow for direct application of the STANAG Allow for direct application of the STANAG
Proficiency ScaleProficiency Scale Texts and tasks are aligned by levelTexts and tasks are aligned by level Each proficiency level is tested separatelyEach proficiency level is tested separately
Test takers take all items for Levels 1,2,3Test takers take all items for Levels 1,2,3 20 texts at each level20 texts at each level One item with 4 multiple choice responses per One item with 4 multiple choice responses per
texttext
Scoring CriteriaScoring Criteria The proficiency rating is assigned based on two The proficiency rating is assigned based on two
separate scoresseparate scores ““Floor” – sustained ability across a range of tasks and Floor” – sustained ability across a range of tasks and
contexts specific to one levelcontexts specific to one level ““Ceiling” – non-sustained ability at the next higher Ceiling” – non-sustained ability at the next higher
proficiency levelproficiency level Must show “mastery” at a level to be assigned Must show “mastery” at a level to be assigned
that levelthat level Non-compensatory scoringNon-compensatory scoring
Performance at the next higher level provides Performance at the next higher level provides evidence of random, emerging, or developing evidence of random, emerging, or developing proficiency at the next higher level.proficiency at the next higher level. Developing proficiency at the next higher level indicates Developing proficiency at the next higher level indicates
a + rating.a + rating.
Leipzig Test of Reading Leipzig Test of Reading Proficiency Proficiency
Test of Test of English reading proficiency for English reading proficiency for entering and exiting students at entering and exiting students at universities in the state of universities in the state of Saxony/GermanySaxony/Germany
To determine proficiency levels from A1 To determine proficiency levels from A1 to C1 according to the CEFRto C1 according to the CEFR
For placement and certification purposesFor placement and certification purposes Entrance and exit requirements in all subjectsEntrance and exit requirements in all subjects
Developed by the University of Leipzig Developed by the University of Leipzig under a grant from the state of Saxonyunder a grant from the state of Saxony
FormatFormat 5 texts with 3 questions each per level5 texts with 3 questions each per level
15 items per level15 items per level Multiple choice questions Multiple choice questions
one correct answer and three distractersone correct answer and three distracters Entire Series of testsEntire Series of tests
Combine 2 or 3 adjoining levelsCombine 2 or 3 adjoining levels A1-B1 or B1-B2 or B1-C1A1-B1 or B1-B2 or B1-C1
Version of the test used in this studyVersion of the test used in this study B1-C1B1-C1
Level A1Level A1 5 texts: 60-100 words each5 texts: 60-100 words each Major tasks and functionsMajor tasks and functions
Topic recognition and comprehension of simple single Topic recognition and comprehension of simple single factsfacts
ContentContent Basic personal and social needs Basic personal and social needs
Text typeText type Very short, simple straight-forward texts: notes, post Very short, simple straight-forward texts: notes, post
cards, simple instructions and directionscards, simple instructions and directions 3 MC questions per text3 MC questions per text
Global, selective, detailGlobal, selective, detail
Screen shot of A1 itemScreen shot of A1 item
to come (requestedfrom Helen)to come (requestedfrom Helen)
Level C1Level C1 5 texts: 200-300 words each5 texts: 200-300 words each Major tasks and functionsMajor tasks and functions
Complex information processing including inferences, Complex information processing including inferences, hypotheses, and nuanceshypotheses, and nuances
ContentContent Academic, professional, and literary materialAcademic, professional, and literary material
Text typeText type Op/ed pieces, analyses and commentaries, detailed Op/ed pieces, analyses and commentaries, detailed
technical reports, literary texts technical reports, literary texts 3 MC questions per text3 MC questions per text
global, detail, inferenceglobal, detail, inference
Scoring CriteriaScoring Criteria
Total number of pointsTotal number of points Rate highest levels that have a Rate highest levels that have a
combined total of at least 18 points combined total of at least 18 points with the lower level with at least 11 with the lower level with at least 11 points (70%)points (70%)
18-24 points (60-80%) = lower level18-24 points (60-80%) = lower level 25-30 points (81-100%) = higher 25-30 points (81-100%) = higher
levellevel
FindingsFindings
A1A1 A2A2 B1B1 B2B2 C1C1 TOTALTOTAL
00 11 11
11 22 44 11 77
1+1+ 44 66 1010
22 11 1616 66 33 2626
2+2+ 66 11 77
33 55 1010 1515
TOTALTOTAL 33 99 2323 1717 1414 6666
Scatter Plot of Total Raw Scatter Plot of Total Raw ScoresScores
LTRP Total Score
BA
T-R
T
ota
l Sco
re
(Correlation of Total Raw Scores r = .905, p < .001)
With the current data, one could With the current data, one could saysay
At the lowest and highest ends of the At the lowest and highest ends of the scales there is alignmentscales there is alignment No one who was rated 1 was also rated No one who was rated 1 was also rated
B2 or C1B2 or C1 No one who was rated 3 was rated A1, No one who was rated 3 was rated A1,
A2, or B1. A2, or B1. The middle ranges are where there is The middle ranges are where there is
the least amount of alignmentthe least amount of alignment A BAT-R 2 can be anything from A2 to C1 A BAT-R 2 can be anything from A2 to C1
A1A1 A2A2 B1B1 B2B2 C1C1 TOTALTOTAL
00 11 11
11 22 44 11 77
1+1+ 44 66 1010
22 11 1616 66 33 2626
2+2+ 66 11 77
33 55 1010 1515
TOTALTOTAL 33 99 2323 1717 1414 6666
BAT-R BAT-R LTRPLTRP
0 0 0 or A10 or A1
11 A1 or A2, A1 or A2, (Mostly A2)(Mostly A2)
1+ 1+ A2 or B1 (Mostly B1)A2 or B1 (Mostly B1)
22 A2, B1, B2, or C1 (Mostly B1)A2, B1, B2, or C1 (Mostly B1)
2+2+ B2 or C1 (Mostly B2)B2 or C1 (Mostly B2)
3 3 B2 or C1 (Mostly C1)B2 or C1 (Mostly C1)
With the current data, one could With the current data, one could saysay
LTRPLTRP BAT-R BAT-R
A1A1 0 or 1 (Mostly 1)0 or 1 (Mostly 1)
A2A2 1, 1+ or 2 (Mostly 1)1, 1+ or 2 (Mostly 1)
B1B1 1+ or 2 (Mostly 2)1+ or 2 (Mostly 2)
B2B2 2, 2+ or 3 (Mostly 2)2, 2+ or 3 (Mostly 2)
C1C1 2, 2+ or 3 (Mostly 3)2, 2+ or 3 (Mostly 3)
With the current data, one With the current data, one could saycould say
Estimated ProbabilityEstimated ProbabilityEstimated Probability of a BAT-R Rating Estimated Probability of a BAT-R Rating
Based on LTRP RatingBased on LTRP Rating
BAT-R RatingBAT-R Rating
LTRP LTRP RatingRating
00 11 1+1+ 22 2+2+ 33
00 0.930.93 0.070.07 .. .. .. ..
A1A1 0.300.30 0.670.67 0.030.03 .. .. ..
A2A2 0.010.01 0.490.49 0.400.40 0.090.09 .. ..
B1B1 .. 0.030.03 0.210.21 0.740.74 0.010.01 0.010.01
B2B2 .. .. 0.010.01 0.570.57 0.230.23 0.180.18
C1C1 .. .. .. 0.040.04 0.080.08 0.880.88
Shaded values are highest probability on the row.Shaded values are highest probability on the row.
What is the probability?What is the probability?
That a BAT-R 2 is also a LTRP:That a BAT-R 2 is also a LTRP:
A2A2 9% 9% B1B1 74%74% B2B2 57%57% C1C1 5%5%
What is the probability?What is the probability?
That a BAT-R 3 is also an LTRP:That a BAT-R 3 is also an LTRP:
B1B1 9%9% B2B2 18%18% C1C1 88%88%
What is the probability?What is the probability?
That a LTRP B1 is also a BAT-R:That a LTRP B1 is also a BAT-R:
11 3%3% 1+1+ 21%21% 22 74%74% 2+2+ 1%1% 33 1%1%
What is the probability?What is the probability?
That a LTRP B2 is also a BAT-R:That a LTRP B2 is also a BAT-R:
1+1+ 1%1% 22 57%57% 2+2+ 23%23% 33 18%18%
Answering the Broad Answering the Broad QuestionsQuestions
Can the two systems be compared?
YES
Are the two systems related?
YES
Can the two systems be
aligned?
Somewhat
Can the two systems be equated?
Probably not
““Heat Chart”Heat Chart”
CEFR
STANAG 6001
When comparing testing When comparing testing systems systems
Ask about the purpose of the testAsk about the purpose of the test Placement, progress, prove a level, etc.Placement, progress, prove a level, etc.
Ask about what the test is testing Ask about what the test is testing Is it a test of achievement, performance, proficiency? Is it a test of achievement, performance, proficiency? Does it test spontaneous abilities or rehearsed Does it test spontaneous abilities or rehearsed
performance? performance? Ask about how the test scores are determinedAsk about how the test scores are determined
Non-compensatoryNon-compensatory prove a floor and ceiling prove a floor and ceiling
Total pointsTotal points Ask if research existsAsk if research exists
Answers from a CEFR ExpertAnswers from a CEFR Expert
CEFR is not one system. It is NOT intended to be used to transfer scores from CEFR is not one system. It is NOT intended to be used to transfer scores from one country to the next or from one language to another but rather to set a one country to the next or from one language to another but rather to set a framework within which educators can build curricula. framework within which educators can build curricula.
Not a harmonisation projectNot a harmonisation project
Alignment is problematic because we do not know what we are aligning. Not a Alignment is problematic because we do not know what we are aligning. Not a matter of alignment or equivalency but a matter of relationship matter of alignment or equivalency but a matter of relationship
The scale is an origin for comparison. The scale functions as exemplars and The scale is an origin for comparison. The scale functions as exemplars and activities. The scale is a meta-framework for learning and teaching.activities. The scale is a meta-framework for learning and teaching.
Conversation with Nick Saville, Conversation with Nick Saville, Cambridge, EnglandCambridge, EnglandApril 15, 2010April 15, 2010
In ClosingIn Closing
It is a far, far better It is a far, far better thing that we do than thing that we do than we have ever donewe have ever done
to know how to use to know how to use test scores.test scores.
Extra slidesExtra slides
Crosstabulation of Test ResultsCrosstabulation of Test Results