Evaluating test content using cluster analysis and multidimensional scaling
Developers of educational and psychological tests must demonstrate that their tests adequately measure the content domain they purport to measure. Though this fundamental requirement is crucial to the validity of educational and psychological measures, there are few procedures available for evaluating content representation. This dissertation presented and evaluated a new procedure designed to assess the content representation of a test. The new procedure used multidimensional scaling (MDS) and cluster analysis to analyze similarity ratings of test items obtained from subject matter experts (SMEs). The procedure was evaluated with respect to two tests: the Auditing Test from the Uniform CPA Examination (AICPA, 1990), and a social studies test from the CTBS/4 series (CTB, 1989). Previous methods used to evaluate content representation were applied to these same tests and the relative advantages and limitations of all procedures were investigated. The results indicated that the content structure of these tests, as perceived by the SMEs, could be discovered through MDS analysis of their item similarity ratings. A six-dimensional INDSCAL solution for the auditing data provided six dimensions that were relevant to the content domain tested; three of the four content areas specified in the test blueprint were represented in this solution. A six-dimensional INDSCAL solution for the social studies data provided five content-relevant dimensions; however, only three of the seven content areas delineated in the test blueprint were represented. Cluster analyses of the MDS item coordinates, and correlational analyses of the coordinates and item relevance ratings helped clarify the MDS interpretations. Traditional analyses of the item relevance data identified several items that were not congruous with their blueprint specifications, and several content areas that were not supported by the relevance ratings. It was determined that the MDS analyses of item similarity ratings were essential for providing an independent assessment of the content structure of the tests, but that to fully evaluate the content representation of a test, both item similarity and item relevance data should be gathered. The limitations of the procedure, and suggestions for future research are discussed.
Psychological tests|Educational evaluation
Sireci, Stephen Gerard, "Evaluating test content using cluster analysis and multidimensional scaling" (1993). ETD Collection for Fordham University. AAI9324629.