Brian D. Bontempo, PhD.
Brian D. Bontempo, PhD.

Brian D. Bontempo, PhD.

Introduction: Recently, we ran a reliability analysis for a client that is worth sharing. The certification exam had about 400 items and the reliability came in under .80 as measured by Cronbach’s Alpha (Cronbach, 1951), an undesirable if not unacceptable reliability for any test let alone a test containing 400 items. (NOTE: Mountain Measurement didn’t design or build this exam; we simply analyzed the data in an effort to begin helping the client).

Background: Reliability is a psychometric indicator of test quality. There is a direct relationship between test length (number of items) and the reliability. Many 50-60 item tests achieve a reliability that exceeds .80. Given that this exam contained hundreds of items, we set out to determine the cause of the low reliability.

Possible Cause #1- Low Item Quality: We calculated the item-test score total correlation and discovered that the vast majority of the items had positive point-biserial correlation coefficients revealing that the items were good.

Possible Cause #2- Misalignment between the difficulty of the test items (questions) and the candidates taking the exam: We calculated the distribution of test scores and in this, we found some problems. We discovered that the mean raw score was quite high (about 325) and that the standard deviation was quite small (around 15). The scores were so high that no-one scored below a 200.

When designing a test, the items should be targeted to the examinees. This maximizes the efficiency of the test. For our client, the 200 easiest items of this exam were adding absolutely nothing to the assessment. They were simply wasting the examinees’ time. Moreover, based on the standard deviation there were only about 60 items that were providing measurement information. All others were really too easy or too hard to be providing much information at all. In essence, this was a 60 item test which explains why the reliability was around .80.

Recommendation: In order to raise the reliability of this exam, we advised the client to write more items that would be similar in difficulty to the 60 items that were best targeted to the examinees based on their empirical performance.