Using Constructive-response items in a large-scale assessment cause costly practical issues as well as psychometric issues. In order to solve such problems, research has been developed to explore the possibility of incorporating automated scoring system for constructive-response items in other countries. Therefore, this study is to validate developing an automated scoring system in Korea and to explore potential applications of automated scoring systems. To do this, this study investigates the extent to which the rating produced by the automated scoring system compares to those produced by human raters.
Correlation analyses reveal a relatively high correspondence between human and automated scoring system ratings. But, for some items, the results of correlation between automated scoring system and human raters were so low. It suggests that the cause of differences between the system of automated scoring and human rating examine to develop an automated scoring system. Also, the result showed that the effect of rater was a trifling on the consistent scoring and the measures of human raters and the automated scoring system were similar. The results of the study imply that we are able to get the plausibility of utilizing automated scoring system for reliable assessment of constructive-response items in a large-scale assessment as well as for immediate feedback to student's responses.