초록

중학생의 과학탐구활동 수행평가 시 총체적 채점과 분석적 채점의 신뢰도를 비교 분석하였으며, 분석적 채점을 하는 경우에는 신뢰도 확보를 위하여 채점척도의 수준을 어느 정도로 분석적으로 해야 하는지를 조사하였다. 중학생들이 작성한 4개의 과학탐구과제에 대한 활동지를 두 명의 채점자가 총체적 채점 방식, 분석적 채점 방식, 분석적 채점 중 채점척도를 2, 3, 4～7수준으로 다르게 하여 채점하였다. 총체적 채점 방식은 과제 간 내적 일치도가 높게 나타났으며, 분석적 채점 방식은 채점자간 신뢰도가 높게 나타났다. 또한 채점척도 3수준의 경우는 4～7수준의 경우와 활동간 내적 일치도와 채점자간의 신뢰도가 유사하게 나타났으나, 능력추정치별 학생의 분포, 문항곤란도 및 문항특성곡선의 경우 채점척도 3수준의 경우가 적절한 것으로 나타났다. 이러한 연구 결과는 과학탐구활동 수행평가 시 총체적 채점 방식을 선택하는 경우는 과제 간 내적일치도를 높일 수 있으며 분석적 채점 방식에 비해 낮게 나타나는 채점자 간 일치도를 높이기 위한 채점자간 협의등 방안이 필요하다는 것을 시사한다. 또한 분석적 채점 방식을 선택하는 경우는 채점척도 3수준으로 충분히 신뢰도를 확보할 수 있다는 점을 시사한다.

In this study, reliabilities of holistic scoring method and analytic scoring method were analyzed in performance assessments of middle school students' science investigation activity. Reliabilities of 2, 3, and 4~7-level rubric ratings for analytic scoring methods were compared to figure out optimized numbers of rubric ratings. Two trained raters rated four activity sheets of 60 students by two rating methods and three kinds of rubric ratings. Internal consistency reliabilities of holistic scoring methods were higher than those of analytic scoring methods, while intra-rater reliabilities of analytic scoring were higher than those of holistic scoring methods. Internal consistency reliabilities and intra-rater reliabilities of 3-level rubric rating showed similar patterns of 4~7 rubric ratings. But students' discriminations, item difficulties and item-response curves showed that the 3-level rubric ratings was reliable. These results suggest that holistic scoring method could be adapted to increase internal consistency reliabilities with improvement in intra-rater reliabilities by rater's conferences. Also, the 3-level rubric rating would be enough for good reliability in case of adapting analytic scoring methods.