Brian D. Bontempo, PhD.
Brian D. Bontempo, PhD.

Brian D. Bontempo, PhD.

This week’s entry is inspired by a Figure in a widely publicized book by Sharon A. Shrock and William Coscarelli called Criterion Referenced Test Development: Technical and Legal Guidelines for Corporate Training. I have not completed reading this book, but I find the authors perspective to be an interesting departure from the traditional testing paradigms (which equals 20 cents for those of you keeping score). I would like to thank them for stretching my mind and inspiring me to think differently about testing in corporate settings.

I would like to take this opportunity to continue the stretching by expanding on a topic they discuss in chapter two, which defines both criterion-referenced testing (CRT) and norm-referenced testing (NRT). A norm-referenced test (NRT) is one in which test outcomes (e.g., grades or pass/fail) are determined based on each examinee’s score relative to the other examinees. Although this practice is uncommon (and arguably unethical) it is occasionally still used today. For example, some of the state Bar exams are norm-referenced. Typically, the top X percent of examinees are awarded a passing mark, regardless of how competent or incompetent the group of test takers was that took the exam together. In other words, if a prospective lawyer was to take an exam along with the most competent group of graduates, then (s)he would have less chance achieving a passing mark than (s)he would have if (s)he took the exam alongside a group of bottom feeders. Does it surprise you that the legal profession would endorse something out-dated, scientifically unsupported and arguably unethical?

On the other hand, a criterion-referenced test (CRT) is a test composed of specific objectives, or competency statements. This type of test is common in licensure and certification. The passing rates for CRTs vary with each test cycle since examinees are evaluated based on their competency relative to a criterion-referenced passing standard (aka cutscore). There are many other attributes of these two types of tests beyond their scoring methodology, and I’ll leave it up to future posts to expand upon these.

One other type of test that Shrock and Coscarelli refer to is a mastery test, a test where most examinees answer the vast majority of the content correctly. K-12 classroom tests are commonly designed this way. The distribution of scores for a mastery test looks similar to this (Insert distribution). I think that it is important to point out that mastery tests are a form of criterion-referenced tests. In other words, Criterion-Reference Test Mastery Test. See below for a visual representation of this.

Types of Tests
Types of Tests

So, what do we call a non-mastery CRT? To be honest, I don’t know. I have heard people refer to them as non-mastery tests or non-mastery, criterion-referenced tests.

Mastery tests are useful in the corporate training world where the content domains are small (typically measured in class hours) and the shelf-life of the training programs and tests are generally short (measured in months or years). However, they are NOT optimal for certification (corporate or non-corporate).

Mastery Test Curve
Mastery Test Curve

Why should a corporation build a non-mastery, criterion-referenced test? There are two primary reasons.

  1. If constructed properly, non-mastery, criterion-referenced tests provide more information than a simple pass/fail result.  Non-mastery, criterion-referenced tests are competency measurement instruments. Just as a ruler measures the length of an object, a non-mastery, criterion-reference test can measure the competency of an individual. This ruler can be used to measure the competency of individuals or the difficulty of the test questions which can provide valuable feedback to the training program or corporation.
  2. When the level of mastery changes, it is much easier to change the level of competency required to achieve mastery, than it is to write new content or a whole new exam.