national assessment in sweden a collaborative (ad)venture · forum criteriorum helsinki, 30 sept....

13
Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden [email protected] 1 www.gu.se National Assessment in Sweden A collaborative (ad)venture Gudrun Erickson University of Gothenburg Dept. of Education and Special Education FORUM CRITERIORUM Helsinki, 30 Sept. 2015 www.gu.se Outline Background Collaboration Discussions and Developments Aims – Construct(s) – Methods – Agency – Uses & Consequences Ongoing activities Prospects and Challenges

Upload: others

Post on 17-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: National Assessment in Sweden A collaborative (ad)venture · Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden gudrun.erickson@ped.gu.se 3

Forum Criteriorum Helsinki, 30 Sept. 2015

Gudrun Erickson University of Gothenburg, Sweden [email protected] 1

www.gu.se

National Assessment in Sweden

— A collaborative (ad)venture

Gudrun Erickson University of Gothenburg

Dept. of Education and Special Education

FORUM CRITERIORUM

Helsinki, 30 Sept. 2015

www.gu.se

Outline

• Background

• Collaboration

• Discussions and Developments Aims – Construct(s) – Methods – Agency – Uses & Consequences

• Ongoing activities

• Prospects and Challenges

Page 2: National Assessment in Sweden A collaborative (ad)venture · Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden gudrun.erickson@ped.gu.se 3

Forum Criteriorum Helsinki, 30 Sept. 2015

Gudrun Erickson University of Gothenburg, Sweden [email protected] 2

www.gu.se

Background – system level

• Mid 1990:s: Shift from a highly centralized to a highly decentralized system; however, national curricula, incl. subject syllabuses, were kept – as were grades from school year 8;

• Shift from a norm referenced (NR) to a “goal and competence related” (CR) grading system – from group level to individual level;

• New standards: Content standards introducing varying degrees of novelty depending on subject; Performance standards/grading criteria, a new phenomenon;

• As before, teachers responsible for awarding grades; four-point scale; Initially, no standards for the highest grade level (for teachers to decide);

• Grades used, to a very large extent, for admission to higher education; hence distinctly high-stakes, and requiring assessment literacy.

www.gu.se

Background – national assessment • Long tradition of assessment at the national level; high degree of acceptance;

• Number of subjects has varied; a core of Swedish, Mathematics and English;

• From the early 1980s, different universities commissioned by the National Agency for Education to be responsible for test development, incl. research (aspects of quality as well as legitimacy); today, research grants to be applied for from other sources;

• No exams; national test results to be combined with teachers’ continuous observations (no weighting);

• New role for national tests after the shift from norm-referencing: primarily concerning interpretation and use for individual students, but also re. format in relation to standards (“does ‘reasoning’ require constructed response…?”);

• Teachers mark their own students’ tests, and award final grades; co-rating strongly recommended but not mandatory.

Page 3: National Assessment in Sweden A collaborative (ad)venture · Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden gudrun.erickson@ped.gu.se 3

Forum Criteriorum Helsinki, 30 Sept. 2015

Gudrun Erickson University of Gothenburg, Sweden [email protected] 3

www.gu.se

Collaboration

• ”To work with another person or group in order to achieve or do something” • ”To give help to an enemy who has invaded your country during a war…”

+ + + + + RATIONALE

Validity, focusing on USE – not on materials or procedures per se To optimize quality by bringing in as many stakeholders as possible

+ + + + + Partners

National and international authorities, institutions and individuals; Researchers within different fields; teachers; teacher educators; students…

+ + + + + Reasons to include students:

Ethical/Democratic reasons – Empowerment – Pedagogical reasons – Impact – Obvious reasons

www.gu.se

Self-assessment

Peer-assessment

Teacher assessment

External assessment

*****

What is assessed?

Not the student but his/her knowledge, competence and

development of competence !

The issue of actors and agency

Page 4: National Assessment in Sweden A collaborative (ad)venture · Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden gudrun.erickson@ped.gu.se 3

Forum Criteriorum Helsinki, 30 Sept. 2015

Gudrun Erickson University of Gothenburg, Sweden [email protected] 4

www.gu.se

Discussions and Developments

• Initially, great caution not to interfere at the local level; schools and teachers given very much responsibility;

• Fear of too much advice and support, i.e. that national tests would be perceived and treated as exams;

• However, a growing number of observations and studies indicating considerable differences in handling national tests and in awarding grades; issues of fairness and equity raised; external criticism (OECD)

• Fear of national tests possibly providing too little advice and support, i.e. national tests not important enough;

• What is “lagom” advice and support…? (lagom = not too much, not too little)

• Aims, scope and impact/washback of national tests in focus.

www.gu.se

The Question of Aim(s) • For a number of years, the aims of the Swedish national tests were to - enhance educational achievement - concretize standards - clarify goals and indicate strengths and weaknesses in individual

learner profiles - enhance equity in assessment and grading - provide evidence for local and national analyses of educational

achievement

• Since 2008, ‘only’ the last two – with the first two as ‘possible effects’

• On-going discussions about separating the two remaining aims: national tests – national evaluations (all students – sample based)

• How many aims can be catered for by the same test?

Page 5: National Assessment in Sweden A collaborative (ad)venture · Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden gudrun.erickson@ped.gu.se 3

Forum Criteriorum Helsinki, 30 Sept. 2015

Gudrun Erickson University of Gothenburg, Sweden [email protected] 5

www.gu.se

Current Situation • Extensive assessment system: formative and diagnostic materials offered to schools;

summative national tests mandatory (year 3, 6, 9 + courses in upper sec. school) • Considerable changes following the introduction in 2011 of new curricula and

standards, and a new, six-point grading scale (A-F); • As before, content standards for subjects more or less similar/different to previous

descriptions; performance standards general/generic across levels and subjects; “value expressions” attached to levels;

• Requirement regarding final/course grades that all aspects of the performance standards for a certain grade level must be met for a student to obtain the grade; hence, uneven profiles not accepted; a non-compensatory system;

• Considerable increase of national tests – younger students, more subjects; • Tests much appreciated, but increase of complaints lately, in particular about workload,

but also, e.g, concerning the compensatory issue. • National tests re-rated by the Schools Inspectorate; problems discovered regarding

inter-rater consistency + teachers’ ratings of their own students’ test; Results heavily – and promptly – publicized; Methodological concerns have been raised.

www.gu.se

Observations and Issues

• Uneven profiles within subjects; for example listening comprehension vs. writing in English; cf. discussions of dimensionality and compensation;

• Varying results within subjects between years – due to which aspects of the construct that are in focus, format effects, weak specifications, weak piloting…?

• Consistent differences in results between subjects; for example considerably lower results in mathematics than in Swedish and English – due to construct, context, curricula, teaching/teachers/teaching materials, tests…?

• Consistent differences between subjects regarding the relationship between national test results (aggregated test grade) and final grades; – problematic – why/why not?; possible measures to adjust? relation to teachers’ attitudes to/comments about the tests…?

• What is a difficult task…? The importance/weight of construct analyses; empirical data; students’ perceptions…?

Page 6: National Assessment in Sweden A collaborative (ad)venture · Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden gudrun.erickson@ped.gu.se 3

Forum Criteriorum Helsinki, 30 Sept. 2015

Gudrun Erickson University of Gothenburg, Sweden [email protected] 6

www.gu.se

Aggregated test scores, year 9 – spring 2014 % F E D C B A English 3 11 17 30 22 18 Swedish 4 20 25 28 18 6 Mathematics 13 36 19 17 9 7 Biology 8 27 27 19 13 6 Physics 12 32 25 16 10 5 Chemistry 13 32 23 14 12 6 Geography 7 21 22 29 13 8 History 13 19 27 19 13 11 Religious studies 7 30 17 26 12 9 Social studies 6 29 25 24 11 6

www.gu.se

Relation between aggregated test grade/final subject grade, year 9; spring 2014 (n≈90 000 En, Sw, Ma; NSc 1/3 sample; SocSc 1/4 sample)

FSG<ATG FSG=ATG FSG>ATG

English 18 % 74 % 9 % Swedish 11 % 66 % 23 % Mathematics 2 % 67 % 31 %

Biology 7 % 63 % 30 % Physics 4 % 58 % 38 % Chemistry 5 % 61 % 34 %

Geography 7 % 69 % 24 % History 8 % 60 % 32 % Religious studies 5 % 68 % 27 % Social studies 4 % 64 % 32 %

Page 7: National Assessment in Sweden A collaborative (ad)venture · Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden gudrun.erickson@ped.gu.se 3

Forum Criteriorum Helsinki, 30 Sept. 2015

Gudrun Erickson University of Gothenburg, Sweden [email protected] 7

www.gu.se

Relation between aggregated test grade/final course grade – Upper Secondary level – spring 2014 (n = > 20 000)

FSG<ATG FSG=ATG FSG>ATG

English 5 13 % 73 % 14 % Swedish 1 13 % 65 % 22 % Mathematics 1a 2 % 68 % 31 % Mathematics 1b 2 % 74 % 24 %

English 6 12 % 71% 17% Swedish 3 10 % 52 % 38 % Mathematics 2b 1 % 59 % 40%

www.gu.se

ATG / FSG for Swedish, year 9; 1998-2014

Page 8: National Assessment in Sweden A collaborative (ad)venture · Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden gudrun.erickson@ped.gu.se 3

Forum Criteriorum Helsinki, 30 Sept. 2015

Gudrun Erickson University of Gothenburg, Sweden [email protected] 8

www.gu.se

ATG / FSG for Mathematics, year 9; 1998-2014

www.gu.se

ATG / FSG for English, year 9; 1998-2014

Page 9: National Assessment in Sweden A collaborative (ad)venture · Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden gudrun.erickson@ped.gu.se 3

Forum Criteriorum Helsinki, 30 Sept. 2015

Gudrun Erickson University of Gothenburg, Sweden [email protected] 9

www.gu.se

Test development – a collaborative process Example languages: www.nafs.gu.se/english/information/

• Analyses of relevant literature and research

• Development of test specifications – internal and external

• Continuous work in broad groups of experts: task development

• Small-scale piloting > Adjustments: an iterative process

• Large-scale pre-testing in randomly selected groups of students from the whole population of future test takers (n ≈ 400); Anchor items used; Questionnaires to all students and teachers

• Analyses of results and of students’ and teachers’ perceptions and suggestions (qualitative and quantitative methods)

• Compilation of tests (reference groups); Extensive guidelines • Standard setting and benchmarking in broad groups

• Analyses / Research / Reporting (publicly available) The typical test of EFL: Speaking (int+prod); Reception (List+Read), Writing

www.gu.se

Test Development – the basis for Standard Setting

• Similarities and Differences between subject test development groups/universities – various reasons

SIMILARITIES • Adherence to subject syllabuses • Collaboration with teachers, students, researchers… • Piloting…

DIFFERENCES • Specifications – type(s); level of detail… • Use of (and feelings about…) different formats – e.g., proportion of selected and

constructed response items/tasks; degree of performance assessment tasks • Design and volumes of piloting/pre-testing • Linking procedures / use of anchor items • Analyses of data • Standard setting procedures…

Page 10: National Assessment in Sweden A collaborative (ad)venture · Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden gudrun.erickson@ped.gu.se 3

Forum Criteriorum Helsinki, 30 Sept. 2015

Gudrun Erickson University of Gothenburg, Sweden [email protected] 10

www.gu.se

Standard Setting – based on test development • Similarities and Differences between subject test development groups/

universities – various reasons

SIMILARITIES • Judgemental approach • Teachers active – central – in the process • Angoff-related procedures • Minimum standards…

DIFFERENCES • Number of participants; proportion of different participants • Rater training procedures • Use of empirical data; if so, when, how, what data? • Role of panelists’ suggestions • Analyses of panelists’ suggestion • Decision-making: when, where, by whom?

www.gu.se

EXAMPLE: English: Writing

• Topics piloted and tested in random groups (c. 400 students/task)

• Topic(s) chosen based on analyses of results and reactions

• Internal process, incl. studies of intra- and inter-rater consistency, precedes selection of c. 50 texts sent to c.12 external raters (gradual

change of raters; always experienced raters + new raters)

• The external raters independently rate and comment on the texts

• Results collected, analysed and presented during a joint meeting

• Texts discussed and selected; comments produced, based on performance standards and analytical factors

• Texts and comments included in teacher guidelines

Page 11: National Assessment in Sweden A collaborative (ad)venture · Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden gudrun.erickson@ped.gu.se 3

Forum Criteriorum Helsinki, 30 Sept. 2015

Gudrun Erickson University of Gothenburg, Sweden [email protected] 11

www.gu.se

EXAMPLE: English: Reception

• c. 15 participants (mostly experienced teachers)

• Combination of Angoff and Bookmark related procedures

• Participants ‘take the test’ and suggest cut-off scores based (1) on the standards; (2) on their experience; After this, empirical data

are introduced and discussed; recommendation given

• Final decision in a small group, combining all data (pre-testing results and reactions; anchor items; cut-off suggestions and comments from the group; estimates regarding ‘high-stakes

effects’, etc)

• The NAE often participate – have the final say

www.gu.se

The issue of standard setting

• No real standard setting tradition in Sweden – partly due to the long period of norm-referencing (30+ years), but also to the sovereign role of teachers in awarding grades; no exam tradition;

• Uncertainty about the role of test results for teachers’ grading may have added to the relatively low interest in standard setting issues (“What is “lagom” advice and support…?”);

• Standard setting traditionally associated with measurement and psychometrics – areas not focused upon in Sweden (and surrounded by some suspicion…);

• Dropping school results – discovered through international surveys! – and the increasing differences between schools have brought about discussions of equity – leading to more attention being paid to standard setting;

• Several measures being taken to strengthen the system:

Page 12: National Assessment in Sweden A collaborative (ad)venture · Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden gudrun.erickson@ped.gu.se 3

Forum Criteriorum Helsinki, 30 Sept. 2015

Gudrun Erickson University of Gothenburg, Sweden [email protected] 12

www.gu.se

Impact and effects on / for

Students’ learning, view of themselves – and others

Students’ view of their own competence – and others’

Students’ educational choices and possibilities

The content and methods of teaching and education

Society at large…

USE – effects/consequences of tests

www.gu.se

A Common Framework for National Tests • Preparatory work during a number of years;

• Current assignment by the National Agency for Education to a small working group (to collaborate with reference groups)

Theoretically founded quality assurance Theoretical + practical part Main target groups: the NAE and the university institutions developing the tests (indirectly also other stakeholders)

• The Framework intended to form the basis for the different test development groups to develop their subject specific specifications; clarity and transparency emphasized;

• Planned reporting to the NAE in October 2015; however, postponed due to a politically initiated investigation of the national assessment system at large:

Page 13: National Assessment in Sweden A collaborative (ad)venture · Forum Criteriorum Helsinki, 30 Sept. 2015 Gudrun Erickson University of Gothenburg, Sweden gudrun.erickson@ped.gu.se 3

Forum Criteriorum Helsinki, 30 Sept. 2015

Gudrun Erickson University of Gothenburg, Sweden [email protected] 13

www.gu.se

On-going investigation/analysis of the national tests for lower and upper secondary school

The investigator should

• analyse the aims, function and extent/scope of the system at large, • propose a system for continuous national evaluation for trend measurement over time, • propose how the rating of student responses and performances should be designed to ensure equal procedures, • draw up a proposal aimed to increase the proportion of external marking of national tests in a cost effective way, • analyse the possibilities for digitalization of national tests and suggest how, to what extent, and at what pace this can happen…

To be reported by 31 March 2016

www.gu.se

Prospects and Challenges

• Building on, maintaining and further develop the positive attitudes to, and (intended/assumed) washback of, the national assessment system;

• Increasing stability, thereby contributing to validity and reliability – “fairness and equity”;

• Enabling systematic studies of development over time;

• Further developing and elaborating methods of collaboration with broad groups of stakeholders, and between policy, practice and research.