inter-rater reliability in the kpg exams the writing production and mediation module
TRANSCRIPT
Inter-rater reliability in the KPG exams
The Writing Production and Mediation Module
Inter-rater reliability in KPG
AIM:
To check the effectiveness of the instruments employed throughout the rating process•Rating Grid – Assessment Criteria•Training Material & Training Seminars•On-the-spot consultancy to raters
Script Raters Profile
• Experienced teachers• Underwent initial training in rating KPG
scripts• Undergo specialized training for every
test administration
Script rater training
• Specialized training on rating scripts based on expectations for every activity Analysis of expected output Presentation of rated scripts Actual rating of selected samples
• Rating scripts under supervision
The rating procedure
• Each script is rated by two script raters randomly selected from a pool of trained raters
• Second ratings are independent of the first (no identifying information, no marks or symbols)
• Constant monitoring/consultancy during the process
METHODOLOGY OF STUDYComputing Inter-rater reliability
Sampling
• Random sample of at least 40% of the total number of scripts
• Periods: May 2005 to November 2007• Levels: B1, B2 & C1
Intraclass Correlation Coefficient
• ICC vs. Pearson’s rThe ICC is an improvement over Pearson's as it takes into account the differences in ratings, along with the correlation between raters.
• ICC in SPSSAverage measure reliability analysis for one-way random effects
Interpretation of ICC• r <0.40 poor agreement• 0.40≤ r ≤0.75 good agreement• r >0.75 excellent agreement
(Fleiss, 1981)• r <0.00 poor agreement• 0.00 ≤r ≤0.20 slight• 0.21 ≤r ≤0.40 fair• 0.41 ≤r ≤0.60 moderate• 0.61 ≤r ≤0.80 substantial• 0.81 ≤r ≤1.00 almost perfect
(Landis & Koch, 1977)
KPG module 2
• Free writing production• Mediation
Findings
MAY 2005
NOVEMBER 2005
MAY 2006
NOVEMBER 2006
MAY 2007
NOVEMBER 2007
B2 - FREE WRITING PRODUCTION 0,74 0,70 0,76 0,68 0,76 0,72
C1 - FREE WRITING PRODUCTION 0,57 0,56 0,63 0,52 0,59 0,66
B1 - FREE WRITING PRODUCTION 0,76 0,73
Findings
Findings
Findings
MAY 2005
NOVEMBER 2005
MAY 2006
NOVEMBER 2006
MAY 2007
NOVEMBER 2007
B2 - MEDIATION 0,77 0,75 0,74 0,72 0,80 0,69
C1 - MEDIATION 0,62 0,60 0,68 0,53 0,69 0,71
B1 - MEDIATION 0,83 0,88
Findings
Findings
Totals
Totals
Conclusion
• Correlations are high – Positive impact of instruments
• Trendlines are sloping upwards – Experience in rating and training are directly related to rater agreement indices
Further research
• Task Analysis to investigate correlation between item difficulty and ICC
• In process: Detailed task analysis project carried out by linguists and psychologists AIM:
To determine the variables affecting the difficulty of a task