The purpose of the study was to investigate the extent to which raters coming from diverse backgrounds exhibited different levels of rating ability while scoring speaking performances. The study also aimed to examine how raters with different backgrounds could develop their rating ability over time. For this purpose, raters' background characteristics were first explored in regard to (1) experience in rating L2 speaking assessments, (2) TESOL experience, (3) rater training accompanied with rating experience, and (4) relevant coursework completed. Raters were classified into novice, developing, and expert groups accordingly in order to examine the extent to which the three rater groups exhibited different scoring behaviors in each of the three rating sessions, which were separated by a one-month interval. Each rater group's changes in rating patterns were also investigated across the rating sessions.
In each of the three rating sessions, the three groups of raters scored a set of pre-recorded speaking responses to five semi-direct placement speaking tasks with an analytic scoring rubric. The raters also recorded how they arrived at certain scoring decisions while rating examinee responses on the first two tasks. Before each rating session the raters were trained, and before the second and third rating sessions they were provided with individual feedback on their previous rating performance.
The three groups of raters' analytic ratings were statistically analyzed in the first phase of the study, focusing on severity, internal consistency, and interaction effects. Statistically, the novice and developing rater groups did not show distinctive rating patterns, especially in regard to interaction effects, while the expert raters displayed the highest rating ability across the three rating sessions. However, in the second phase of the study, in which the raters' verbal reports were qualitatively analyzed focusing on their use of the given scoring criteria, the three groups of raters displayed different rating patterns and developmental paths across the three rating session's. The findings from this study suggest that the different weaknesses that the three rater groups exhibited need to be addressed through individual or group rater training to help raters improve rating ability, and ultimately to minimize rater effects.
|School:||Teachers College, Columbia University|
|School Location:||United States -- New York|
|Source:||DAI-A 72/05, Dissertation Abstracts International|
|Subjects:||Educational tests & measurements, English as a Second Language|
|Keywords:||Expert raters, Novice raters, Rater development, Rating ability, Second language, Speaking assessment|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be