Today, I had the distinct privilege of attending and presenting at the first ever meeting of the International Association of Computerized Adaptive Testing.  Wow!  The diversity of theoreticians, practitioners, and supporters of CAT around the globe is fantastic.  We had folks from 30 different countries in attendance.  This is the logical extension of the conferences put together by GMAC and David Weiss on CAT which have occurred sporadically since the 70s.  It’s great to see a formal entity in existence to promote CAT and to facilitate the sharing of knowledge that is taking place today.

I would like to take this opportunity to comment on a few of the great presentations made today.

1.) Mark Reckase gave one of the keynote addresses.  As always Mark, gave a fantastic speech. He made one assertion that the paradigm of testing has not fully shifted from True Score Theory (aka Classical Test Theory) to Item Response Theory yet.  He claimed that IRT would not claim hegemony until we developed universal scales for some commonly used constructs like mathematical literacy.  In essence, measurement is not yet as real as meteorology because the weather nuts have universal metrics for temperature, humidity, etc… and psychometrics does not. I don’t know if I agree with his statement, but I do agree with the fact that there should be one and only one measurement scale for our basic K12 constructs.  I don’t care how you cut it, Algebra is Algebra.  There’s not a difference between Algebra knowledge in Alabama and Algebra knowledge in Maine.  Sure, Alabama and Maine may teach totally different topics from totally different perspectives, but when it comes right down to it, a student’s achievement level and academic growth in Algebra should measured universally.  In essence, an inch is an inch.  Don’t let any politician, teacher, qualitative researcher, or anti-testing hippy tell you something different.  The real issue will be finding a way to create the scales as open source scales that can be used by all testing and education professionals as opposed to owned by a single testing company.

2.) Wim van der Linden gave the other keynote address, and like Mark was also enjoyable. My main take home from Wim’s talk was the call for even more efficiency in testing than we have currently.  The first CATs cut the number of items needed by 50%. This next wave of CATs may make it possible for us to assess constructs accurately in 50% fewer questions that we use today. This is a great call to action and opens the door for many possibilities in formative assessment.

3.) Barth Riley gave a great paper presentation entitled “CAT item selection and person fit: Predictive efficiency and detection of atypical symptom profiles.” I really enjoyed this session because it exposed me to Linacre’s Bayesian Maximum Falsification (BMF) approach to item selection in CAT. He used this approach as a way to increase the validity of common person fit statistics.  Barth evaluated BMF across the whole spectrum of person fit stats including

  • infit (Wright and Stone, 1979)
  • outfit (Wright and Stone, 1979)
  • Likelihood Z (Drasgow, Levine et Williams, 1985)
  • Modified Caution Index (Harnisch & Linn, 1981)
  • H

Some of these were new to me so I appreciated the introduction. In a maximum information CAT (a traditional CAT), the probability of a correct response to an item is typically 50% which limits the utility of person fit statistics greatly. By implementing the BMF approach, the probability changes thereby increasing the viability of the fit stats.  He got good results.

4.) I’ll be posting slides from our presentation on “The theoretical issues that are important to operational adaptive testing” at some point in the near future on the MM website.  Enjoy.

I’m off to dinner.  Maybe I’ll take some photos and upload them here.