Fitting Propensities of Item Response Theory Models
Fitting propensity (FP) is defined as a model`s average ability to fit diverse data patterns, all else being equal (Preacher, 2006). Preacher (2006), and Bonifay and Cai (2017) studied FP of structural equation modeling (SEM) and item response theory (IRT) models by fitting candidate models to a large number of random datasets drawn uniformly from the complete data space. They found that some models tend to have undesirably high FP due to the inherent flexibility of their functional forms. In this dissertation, a series of Monte Carlo studies were conducted to examine the FP of various popular dichotomous and polytomous IRT models when FP was evaluated by (a) global fit only (termed global fitting propensity; GFP), and (b) parameter estimate quality, item fit, and global fit (termed holistic fitting propensity; HFP). Another goal was to examine the relative contributions of dimensionality and particular types of IRT parameters on FP. Results showed that IRT models have varying levels of FPs as a function of their measurement models and dimensional structures. Specifically, freely estimated discrimination parameters greatly increased model flexibility while pseudo-guessing and inattention parameters induced little additional flexibility. Compared to the unidimensional IRT models, the exploratory item factor analysis and bifactor models had substantially higher GFP, while correlated-factors models did not have any increased GFP. A major finding was that models with high GFP did not necessarily produce easily interpretable parameter estimates and/or acceptable item fit when fit to random datasets. This finding implied that sensible parameter estimates and item fit should be an integral part of IRT model evaluation. Results also showed that iteratively relaxing model constraints may result in finding a good fitting model to even a random dataset. Researchers are warned against post-hoc adjustments to their models in the hopes of finding a good fitting model. Finally, it was found that multiple IRT models tended to produce acceptable item and global fit to the same datasets generated from a known IRT model. Specifically, structurally complex models tended to show good fit to datasets generated from simpler models. Researchers are recommended to fit IRT models on theoretical grounds.
Aytürk Ergin, Ezgi, "Fitting Propensities of Item Response Theory Models" (2020). ETD Collection for Fordham University. AAI27957745.