Samejima items in multiple-choice tests: Identification and implications

Nazia Rahman, Fordham University


Samejima hypothesized that non-monotonically increasing item response functions (IRFs) of ability might occur for multiple-choice items (referred to here as Samejima items) if low ability test takers with some, though incomplete, knowledge or skill are drawn to a particularly attractive distractor, while very low ability test takers simply guess at the answer. In such a case, very low ability test takers may have a higher probability of a correct response than slightly higher ability test takers. The first main goal was to explore concrete mathematical measures such as Mokken's scalability coefficient and isotonic regression and develop them further for the purpose of identifying nonmonotonically increasing IRFs. The second major goal was to explore the scoring paradox resulting from the presence of a large proportion of Samejima items, which could mask their true nonmonotonically increasing nature and lead to the erroneous labeling of items that are actually monotone as Samejima items. The implications of the presence of a large number of Samejima items were also investigated. Item responses were generated for N = 5,000 for 50 monotone items and 16 Samejima items, thus forming a base dataset. Various combinations (conditions = 46) of difficulty, discrimination and dip of the Samejima items were studied to understand the performance and sensitivity of Mokken's scalability coefficient and isotonic regression to these characteristics. For the second part, real data from a large scale high stakes assessment was used for N = 6,000 in order to uncover the paradox. Based on the fact that both Mokken's scalability coefficient and isotonic regressions used total score, they were not found to be effective in identifying Samejima items. This was because no difference was observed between the values of monotonically increasing items and the Samejima items while using these measures. Using real data, the paradox was successfully seen and a few Samejima items were identified in the test with this methodology. Gradually increasing the proportion of Samejima items affected the ability estimation adversely by creating a reordering of latent trait at the lower end of the distribution. It was found that longer the test length, the lesser the impact with the same number of Samejima items in the tests.

Subject Area

Educational tests & measurements|Psychology|Quantitative psychology

Recommended Citation

Rahman, Nazia, "Samejima items in multiple-choice tests: Identification and implications" (2013). ETD Collection for Fordham University. AAI3588224.