Solved – IQ adaptive test items in 1pl, 2pl or 3pl IRT model

Some adapative test systems (e.g. school assessment tools) use the 1pl IRT model, while others use the 2pl or the 3pl. When developing an adaptive IQ test, is there a rule of thumb about which model to choose in calibrating the item difficulty and test takers ability?

I can't find any research that gives some insights in fit between IQ test items and different kinds of IRT models.

Many thanks in advance!

I think the difference primarily is a philosophical one when choosing Rasch/1PL models (the emphases on what measurement means is slightly different in that literature, and hence researchers try their best to obtain these special items), and an empirical/design one when deciding between using 2PL and 3PL models.

Since the slopes are all equal in 1PL models determining a persons location amounts to finding the optimal location where respondents have a P = 0.5 chance of answering correctly by simply choosing items with the best intercepts to get an estimate of $theta$, whereas in 2- and 3PL models it's slightly more complicated due to the unequal slopes and lower bound parameters for guessing. As a consequence, 2-3PL models often require more advanced adaptive item selection procedures such as the Kullback–Leibler/Fisher information to select the next best item for honing in on $theta$.

Speaking purely from a design perspective if the adaptive testing items contain a finite number of responses then the 3PL seems like the better option, but if it's more of a fill in the blank style answer (e.g., 2 + 3 = __.) then the 1PL and 2PL models would, at least theoretically, be more reasonable.

Similar Posts:

Rate this post

Leave a Comment