enhancinglearning and equity - iatefl teasig · • summative tests compulsory, and high stakes...
TRANSCRIPT
IATEFL TEASIG Webinar
Monday 26th February 20181700UK local time / 1800CET
The Dual Function of Assessment–
Enhancing Learning and Equity
Gudrun EricksonUniversity of Gothenburg / Dept of Education and Special Education
Overview
• Attitudes to assessment
• Definitions and functions of assessment
• Why, What, How/When, Who & And…?
• Concepts – in theory and practice• Collaboration
• Discussions – perceptions – suggestions• Good Practice
European 17-year-old:
“I think that a good language test/assessment should get students chance to show what they
know and devolope their skills. It should contain of listening and reading comprehations, as well as
writing and grammar part. But the most important thing is to speak in that language, so I think there
should be also oral exams. Speaking is the most important, because without it, we wouldn’t
communicate. Because of that students should also learn pronunciation.”
4
Public Interest in Assessment
from
Ice cold
to
Scorching hotto, maybe,
?...Somewhat burnt…?
The Dual Function of AssessmentBased on common principles:
• CLARITY, VALIDITY, RELIABILITY, RESPECT •
• Enhancing LEARNING, reflection and [thus] development
‘’Where are you/am I now? Where do you/I have to go?
How can you/I get there?’
AND
• Strengthening quality and [thereby] enhancing fairness and EQUITY
‘’How do you demonstrate your full/true competence in
the best possible way?’
Everything isn’t assessment, is it…?
Assessment as learning…
“Don’t worry! I don’t do testing; I assess all the time…”
* * * * *
To some extent, a need to clarify when assessment takes place,
in order not to”turn life into one long never-ending test for the learner and a
bureaucratic nightmare for the teacher”.(Common European framework of reference for languages : Learning, teaching,
assessment, p. 185)
Fundamental questionsabout Learning, Teaching and Assessment
WHY?
WHAT?
HOW? / WHEN?
WHO?
AND…?
VALIDITYThe ”RIGHT” things, conclusions, actions, consequences…
Major threats:Construct under-representation; Construct irrelevant variance
RELIABILITYFAIRNESS/”RIGHTEOUSNESS”: consistency, comparability,
agreement, equity…
To minimize the effects of chanceInter rater consistency – Intra rater consistency
Need for: Successive validation, Multiple sources of evidence,Collaborative approaches, Ethical considerations
Fundamental concepts
• How well does my/our assessment reflect the curriculum, syllabusor ’course plan’ – national or local?
• How well do I/we cover the whole curriculum?
• What is the balance between different aspects of knowing and forms ofassessment?
• How much do I know about individual students’ profiles (strengths and weaknesses) within the subject in focus?
• In what way(s) – and how – are students active in assessment?
• What message does my/our assessment convey to the students ?
• How are results presented to the students, and how useful are they – to what extent do they feed forward in students’ learning process?
• How and to what extent do we collaborate in assessment?
Everyday validity and reliabilityExamples of questions
The Swedish context• Highly decentralized school system, however national curricula, subject
syllabuses and national tests
• Goal- and criterion referenced system; teachers responsible for grading
• Grades used to a very large extent for admission to higher education; hence high-stakes – and requiring assessment literacy
• Extensive national testing and assessment program (formative + summative aims)
• Advisory function; No central marking; teachers mark their own students’ tests and award final grades; no mandatory co-rating – Changes suggested - underway
• Summative tests compulsory, and high stakes – however, results to be combined with teachers’ continuous assessments in the final grading
• Universities commissioned by the Swedish National Agency for Education to develop all materials: University of Gothenburg, Foreign Languages
National language assessment materials in Swedenwww.nafs.gu.se
(Reflection of national syllabuses: Receptive, Productive and Interactive comeptences (oral and written) // Strategies, Adaptation, Culture…)
Stages English Other modern langugesapprox. passCEFR
1 (A1.1-A2.1) Diagnostic materials (gr 1-6) Formative materials – Fr, Ger, Sp*
2 (A2.1) Subject test (gr 6) Nat. Assessm. materials – Fr, Ger, Sp
3 (A2.1-B1.1) Diagnostic materials (gr 7-9) Nat. Assessm. materials – Fr, Ger, Sp
4 (B1.1) Subject test (gr 9) Nat. Assessm. materials – Fr, Ger, Sp
5 (B1.2) Course test En 5
6 (B2.1) Course test En 6
7 (B2.2) National assessment * Models for self- & peer assessment;
materials En 7 Tasks with educational comments
COLLABORATION – procedureswww.nafs.gu.se/english/information
• Analyses of relevant literature and research
• Development of test specifications – internal and external
• Continuous work in broad groups of stakeholders/experts: task development
• Small-scale piloting > Adjustments – an iterative process
• Large-scale pre-testing in randomly selected groups of students from the whole population of future test takers (n ≈ 400); Anchor items used
• Analyses of results and of students’ and teachers’ perceptions and suggestions (qualitative and quantitative methods)
• Compilation of tests (ref. groups); Extensive guidelines (implicit ‘rater training’)• Standard setting and benchmarking in broad groups• Analyses / Research / Reporting (publicly available)
COLLABORATION – partners
National institutions
International institutions
Different speakers of the target language
Researchers within different fields
Large groups of teachers, incl. special education
Teacher educators
STUDENTS / test-takers
COLLABORATION – rationaleVALIDITY REASONS
Ethical / Democratic reasons
Empowerment
Pedagogical reasons
Impact reasons
OBVIOUS REASONS
STUDENTS as partnersInformation collected through interviews and questionnaires:
• To improve information and guidelines
• To better understand how items and tasks function / do not function
• To optimize task and item quality, e.g. by detecting possible bias
• To detect and eliminate possible obscurity and ambiguity
• To choose topics, texts and tasks
• To compose tests and to optimize sequencing of tasksWhat is a difficult task?What experts define as difficult, based on analyses of the construct…?What is shown empirically, in pre-testing/testing…?What students perceive as difficult…?
• To provide additional information for standard setting
We need YOUR help to make really good tests of English!Please react to the following statements about Back to Nature and then write your comments.
Yes, absolutely No, absolutely not
1 BtN was a good test –––– –––– –––– –––– ––––
2 It was difficult –––– –––– –––– –––– ––––
3 The text wasinteresting to read –––– –––– –––– –––– ––––
4 I learnt something fromdoing this type of test –––– –––– –––– –––– ––––
5 There were many wordsthat I didn’t understand –––– –––– –––– –––– ––––
6 BtN tested somethingthat is important –––– –––– –––– –––– ––––
7 I have done tests likeBtN before –––– –––– –––– –––– ––––
8 I think I did well on BtN –––– –––– –––– –––– ––––
Comments about Back to Nature (you can write in English or in Swedish)
TEST–TAKER FEEDBACK – Some results (year 9)• High response rates: Likert scales c. 95%, comments c. 45%
• c. 35 to 65 % comments; listening most, writing the least
• c. 45 % of the comments are in English
• Girls tend to write comments more often than boys
• Girls frequently underestimate their performances (retrospectively)
• In general, students seem to prefer productive and interactive tasks
• Positive features: variation, ‘usefulness’, learning potential, clarity, challenge… (corresponds with ENLTA study, 2005)
• Negative features: the opposite to the ones above – esp. ‘narrow’ tests
Cf. construct under representation & irrelevant variance
SOME STUDENT COMMENTS• It was little bit difficult but it was very fun! (10)
• It was a very easy test. I can very much english from my games. (11)
• It was fun but a little hard sometimes to find the right word. But that is good I think. Then you learn to describe a word with other words. (12)
• I don’t think you learn anything from this. Useless! (15)
• ”It’s difficult explaint i think. Writhe on swedish so we understand What to do.” (15)
• “Write in swedish!? You must be kidding...” (15)
• It’s always difficult to understand when ”real people” (not teachers) speak. People speak fast and not often so clearly. Good practice! (18)
TEACHERS as partners
• Continuous discussions in reference groups
• Task development / Item writing (university based)
• Piloting – Pre-testingadministrating, observing, analysing, discussing, reporting…
• Selection, composition, sequencing, standard setting
• Rating and benchmarking
• Reporting and responding after administration of tests
TEACHER FEEDBACK – Some results• > 90 % positive to the actual tests (function, content, level of difficulty, extensive and educative guidelines, individual proficiency profiles…)
However…:
• Complaints about the system level:- Workload – no designated time for co-rating- Unclear status of the national tests (what does ‘advisory function’ mean?)- Treatment of uneven profiles in national tests and in rules for final grading
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –• Considerable controversy re. teachers’ rating of their own students’ tests; The National Schools Inspectorate claims considerable leniency + low inter-rater consistency; Academic studies do not fully confirm this – sometimes contradict it.
Study of inter-rater consistency• Swedish national test of English for grade 9 (+ Swedish, Maths)
• 100 randomly sampled, teacher rated tests• Re-rating by two (rec.) or three (wr.) independent raters of subtests focusing
on Receptive skills (reading & listening) and Writing
RESULTS FOR ENGLISH• High degree of consistency, in particular for receptive skills
Receptive skills (r = >.99); Writing (r = .86-.93)However, individual rater profiles
Important considerations
• An integrated approach – assessment for and of learning
• To feed back and to feed forward
• To communicate and collaborate with students – and colleagues
• Breadth and variation (content, tasks, methods…); Progression
• Finding out what students can do — not primarily the opposite
• Making what is the most important measurable – not the opposite
• Commenting on strengths before weaknesses
• Distinguishing between errors that (might) disturb and errors that actually “destroy” communication (impeding errors)
• Presenting results in profiles - not (only) in lump sums
CLARITY – VALIDITY – RELIABILITY – RESPECT
Examples of Codes and Guidelines for Good Practice
AERA Standards for Educational and Psycholoical Testing
ILTA Code of Ethics + Guidelines for Practice
ALTE Code of Practice
EALTA Guidelines for Good Practice in Language Testing and Assessment
…..
Examples of questions (EALTA)
Teacher pre- and in-service education• How relevant is the training to the assessment context of the trainees?• What is the balance between theory and practice in the training?• How far do the assessment procedures used to evaluate the trainees follow the
principles they have been taught?Classroom testing and assessment
• How does the assessment purpose relate to the curriculum?• How well is the curriculum covered?• What account is taken of student views on the assessment procedures?• What kind of feedback do students get?• What use is made of the results?
National or institutional testing units or centres• Are there test specifications?• What training do test developers and item writers have?• Are the tests piloted?• Are validation studies conducted?• What evidence is there of the quality of the process followed to link tests and
examinations to the Common European Framework?
And, finally:
“It’s nice to do nice exercises, that is the thing
which teachers should remember.
Even tests can be nice.”
European 15-year-old student
THANK YOU FOR YOUR ATTENTION!
Selected referencesAssessment Reform Group: http://www.nuffieldfoundation.org/assessment-reform-groupBachman, L.J. (1990). Fundamental considerations in Language Testing and Assessment. Oxford:
Oxford University Press.Council of Europe. (2001), Common European Framework of Reference for Languages: Learning,
teaching, assessment. Cambridge, UK: Cambridge University Press.Erickson, G.& Gustafsson, J-E. (2005). Some European Students' and Teachers' Views on
Language Testing and Assessment. A report on a questionnaire survey. http://www.ealta.eu.org/resources.htm
Erickson, G., & Åberg-Bengtsson, L. (2012). A collaborative approach to national test development. In D. Tsagari & I. Csépes (Eds.), Collaboration in Language Testing andAssessment (pp. 93–108). Frankfurt am Main:Peter Lang.
Gardner, J. (Ed.) (2012). Assessment and Learning. London: Sage.Little, D. (2009). The European Language Portfolio: where pedagogy and assessment meet.
https://www.coe.int/en/web/portfolio/elp-related-publicationsMessick, S. A. (1996). Validity and washback in language testing. Language Testing,13(3), 241–256.
National Assessment of Foreign Languages in Sweden: https://nafs.gu.se/english/information/nafs_eng
Takala, S., Erickson, G., Figueras, N. & Gustafsson, J-E. (2016). Future Prospects and Challenges in Language Assessments. I Tsagari, D. & Banerjee, J. (eds.), Contemporary Second Language Assessment, Contemporary Applied Linguistics (pp. 299-315). London: Bloomsbury Academic.