




口試在甄選人員的應用上已行之多年,歷年相關的研究報告亦多支持採行「結構化口試」的可行性與效用性。但在推行「結構化口試」的措施時,諸多可能干擾口試評分公平性之因素中,多數均可透過標準作業流程來加以排除或改善,唯獨其中一項「評分者效應」,很難從標準作業流程程序中加以排除或控制。因此,本文對此提出一個採行「多面向模型」(many-facet model, MFM)計分的應用建議,以客觀分析口試委員評分資料的各面向因素之估計值—含:考生的能力值、口試問題的難度值、及評審評分嚴苛/寬鬆程度值。根據過去相關文獻記載,應用MFM模型於此類涉及評分者效應的資料分析上,不僅可增進對考生能力值的精確估計,也可以提高整體資料分析的信度值,並得以促進口試評分更具公正性、公平性與正確性。為落實MFM模型能應用於爾後口試評分資料的分析,本文亦建議現成的口試評分表的評分方式必須做改變,從連續型資料屬性的等距性尺規評分(如:在50-59分的成績區間內評出一個分數)改變成離散型資料屬性的次序性尺規評分(如:僅將成績評定成優、佳、普通、差等四個等第),以期降低「評分者效應」對口試評分公平性之干擾影響。


A Study on the Influence of Rater Effect on the Fairness of Oral Examination Scoring

Min-Ning Yu


Oral examinations have been used in selecting personnel for many years, and relevant research reports over the years have also supported the feasibility and effectiveness of adopting “structured oral examinations.” When implementing “structured oral examination” measures, most of the many factors that may interfere with the fairness of oral examination scoring can be eliminated or improved through standard operating procedures. Only one of them -- the “rater effect” -- is difficult to exclude or control from standard operating procedures. This article puts forward a suggestion for the application of “many-facet model” (MFM) scoring to objectively analyze the estimated values of various factors in the oral examination committee scoring data, including the examinee abilities, the difficulties of the oral exam questions, and the severity/lenient parameters of the raters’ grading. According to relevant literature, applying the MFM model to such data analysis involving rater effects can not only improve the accurate estimation of examinee abilities, but also improve the reliabilities of the overall data analysis, and promote more accurate scoring of oral exams to achieve impartiality, fairness, and correctness. In order to ensure that the MFM model can be applied to the analysis of subsequent oral examination score data, this article also suggests that the scoring method of the ready-made oral examination score sheet be changed from an interval scale score (continuous data attribute) (e.g., assign a score within the score range of 50-59 points) into an ordinal scale score of discrete data attributes (e.g., only evaluate the score into excellent, good, average, and poor grades), in order to reduce the interference of the “rater effect” on the fairness of oral examination scoring.

Keywords: rater effect, structured oral examinations, many-facet model, grading fairness, grading severity/lenient parameter