Effects of Rater Training on the Assessment of L2 English Oral Proficiency
Peer reviewed, Journal article
Published version
View/ Open
Date
2020Metadata
Show full item recordCollections
- Institutt for lærerutdanning [3865]
- Publikasjoner fra CRIStin - NTNU [39183]
Original version
Nordic Journal of Modern Language Methodology. 2020, 8 (1), 3-29. https://doi.org/10.46364/njmlm.v8i1.605Abstract
The main objective of this study was to examine whether a Rater Identity Development (RID) program would increase interrater reliability and improve calibration of scores against benchmarks in the assessment of second/foreign language English oral proficiency. Eleven primary school teachers-as-raters participated. A pretest–intervention/RID–posttest design was employed and data included 220 assessments of student performances. Two types of rater-reliability analyses were conducted: first, estimates of the intraclass correlation coefficient two-way random effects model, in order to indicate the extent to which raters were consistent in their rankings, and second, a many-facet Rasch measurement analysis, extended through FACETS®, to explore variation regarding systematic differences of rater severity/leniency. Results showed improvement in terms of consistency, presumably as a result of training; simultaneously, the differences in severity became greater. Results suggest that future rater training may draw on central components of RID, such as core concepts in language assessment, individual feedback, and social moderation work.