Effects of Rater Training on the Assessment of L2 English Oral Proficiency
Peer reviewed, Journal article
Published version
Åpne
Permanent lenke
https://hdl.handle.net/11250/2727310Utgivelsesdato
2020Metadata
Vis full innførselSamlinger
- Institutt for lærerutdanning [3403]
- Publikasjoner fra CRIStin - NTNU [37215]
Originalversjon
Nordic Journal of Modern Language Methodology. 2020, 8 (1), 3-29. https://doi.org/10.46364/njmlm.v8i1.605Sammendrag
The main objective of this study was to examine whether a Rater Identity Development (RID) program would increase interrater reliability and improve calibration of scores against benchmarks in the assessment of second/foreign language English oral proficiency. Eleven primary school teachers-as-raters participated. A pretest–intervention/RID–posttest design was employed and data included 220 assessments of student performances. Two types of rater-reliability analyses were conducted: first, estimates of the intraclass correlation coefficient two-way random effects model, in order to indicate the extent to which raters were consistent in their rankings, and second, a many-facet Rasch measurement analysis, extended through FACETS®, to explore variation regarding systematic differences of rater severity/leniency. Results showed improvement in terms of consistency, presumably as a result of training; simultaneously, the differences in severity became greater. Results suggest that future rater training may draw on central components of RID, such as core concepts in language assessment, individual feedback, and social moderation work.