Brian D. Bontempo Ph.D - MMlog
Brian D. Bontempo Ph.D - MMlog

Brian D. Bontempo, Ph.D

A few weeks ago, I had the privilege of making the annual trek to the AERA/NCME annual meeting. Typically, I’m quite cynical about this event, but this year I was pleasantly surprised. The program was well done, the presentations were pretty darn good, and the people I chatted with stimulated me. All in all, I came back pretty inspired. I’d like to give thanks to Kathy Gialluca and Daniel Bolt, the NCME program chairs and to Barbara Dodd and Chad Buckendahl who served as moderators and discussants in some of my sessions.

Today’s topic is an advanced topic and is a follow-up to the NCME session I discussed on Item Exposure Control Mechanisms in CAT (Computerized Adaptive Testing). This is a topic that is a real yawner for me because its typically business and operational motivations that dictate choice in item exposure rather than academic or psychometric rigor. Before diving in, here’s a very brief history of item exposure control mechanisms.

In the old days (circa 1980s), the CAT community referred to this as Item Selection, not Item Exposure Control. At the time, the algorithms (e.g., Kingsbury, G.G., & Zara, A.R. (1989) all contained an element of randomization. And they all worked quite well. However, there were a number of testing programs using the Two-Parameter Logistic (2PL) or Three-Parameter Logistic (3PL) models (for more information on Item Response Theory – IRT see future posts) that had ‘less than stellar’ item pools. For these programs, there were a number of items that were always selected because they were stellar performers while others were never selected because they were low quality items. Somehow, these programs got the idea that they could improve their exams by doing some complex modeling and finding a way to select these items. Thus was born the Sympson-Hetter item selection algorithm (Sympson, J. B., & Hetter, R. D. (1985). This method succeeds in establishing a maximum threshold at which an item cannot be selected again, but is extremely cumbersome to implement and sucks much of the efficiency out of a CAT. About a decade later, some Spanish researchers proposed the Progressive Method (Revuelta, J. & Ponsada, V. (1998). This method essentially selects items at random at the beginning of the test and slowly but surely turns into a maximum likelihood CAT by the end of test.

The item exposure session at NCME was an academic comparison of these using simulated data for small item pools. I had a good time reading the papers. I like being a discussant. It forces you to read papers you wouldn’t otherwise read. Moreover, it forces you to say something constructive about them (hah!). No offense to the authors. I have three points to make.

During my time as a discussant, I asked the NCME crowd what item control mechanism they were using, and I discovered that the Sympson-Hetter is slightly more popular than the randomization techniques and only a few folks are using the progressive method. I was thoroughly shocked by this. I expected most folks to be using randomization techniques. This tells me one thing…mathematically inclined Psychometricians have a habit of wasting their clients time and dollars because they enjoy mathematically modeling things that today’s computers can collect empirically just as quickly.

Point #2…Item exposure control is only necessary for programs that can’t build a good item pool. In the four papers I discussed, the authors had REALLY BAD item pools. In one, only 51 items were selected from a 176 item pool for a 40 item test. Hmmmm…why build a CAT in this situation? Two fixed forms will perform better than a CAT.

The moral of the this point is that ALL testing professionals need to be concerned with building better item pools. This fixes item exposure issues more than any control mechanism available.

Point #3…I note that One Parameter Logistic Model, or 1PL (Rasch), CATs yield flatter item exposure. That’s one more reason to implement the Rasch model for testing programs moving to CAT.