inter-rater reliability in the kpg exams the writing production and mediation module

Inter-rater reliability in the KPG exams

The Writing Production and Mediation Module

Inter-rater reliability in KPG

AIM:

To check the effectiveness of the instruments employed throughout the rating process•Rating Grid – Assessment Criteria•Training Material & Training Seminars•On-the-spot consultancy to raters

Script Raters Profile

• Experienced teachers• Underwent initial training in rating KPG

scripts• Undergo specialized training for every

test administration

Script rater training

• Specialized training on rating scripts based on expectations for every activity Analysis of expected output Presentation of rated scripts Actual rating of selected samples

• Rating scripts under supervision

The rating procedure

• Each script is rated by two script raters randomly selected from a pool of trained raters

• Second ratings are independent of the first (no identifying information, no marks or symbols)

• Constant monitoring/consultancy during the process

METHODOLOGY OF STUDYComputing Inter-rater reliability

Sampling

• Random sample of at least 40% of the total number of scripts

• Periods: May 2005 to November 2007• Levels: B1, B2 & C1

Intraclass Correlation Coefficient

• ICC vs. Pearson’s rThe ICC is an improvement over Pearson's as it takes into account the differences in ratings, along with the correlation between raters.

• ICC in SPSSAverage measure reliability analysis for one-way random effects

Interpretation of ICC• r <0.40 poor agreement• 0.40≤ r ≤0.75 good agreement• r >0.75 excellent agreement

(Fleiss, 1981)• r <0.00 poor agreement• 0.00 ≤r ≤0.20 slight• 0.21 ≤r ≤0.40 fair• 0.41 ≤r ≤0.60 moderate• 0.61 ≤r ≤0.80 substantial• 0.81 ≤r ≤1.00 almost perfect

(Landis & Koch, 1977)

KPG module 2

• Free writing production• Mediation

Findings

MAY 2005

NOVEMBER 2005

MAY 2006

NOVEMBER 2006

MAY 2007

NOVEMBER 2007

B2 - FREE WRITING PRODUCTION 0,74 0,70 0,76 0,68 0,76 0,72

C1 - FREE WRITING PRODUCTION 0,57 0,56 0,63 0,52 0,59 0,66

B1 - FREE WRITING PRODUCTION 0,76 0,73

Findings

Findings

MAY 2005

NOVEMBER 2005

MAY 2006

NOVEMBER 2006

MAY 2007

NOVEMBER 2007

B2 - MEDIATION 0,77 0,75 0,74 0,72 0,80 0,69

C1 - MEDIATION 0,62 0,60 0,68 0,53 0,69 0,71

B1 - MEDIATION 0,83 0,88

Findings

Totals

Conclusion

• Correlations are high – Positive impact of instruments

• Trendlines are sloping upwards – Experience in rating and training are directly related to rater agreement indices

Further research

• Task Analysis to investigate correlation between item difficulty and ICC

• In process: Detailed task analysis project carried out by linguists and psychologists AIM:

To determine the variables affecting the difficulty of a task

inter-rater reliability in the kpg exams the writing production and mediation module

Documents

process slide

rating kpg scripts

supervision slide

rating procedure

mediation module slide

test administration

b2 c1 slide

initial training