Brian D. Bontempo, PhD.
Brian D. Bontempo, PhD.

Brian D. Bontempo, PhD.

If an examinee answers a single test item (aka question) about a specific topic correctly, what can we infer about the examinee’s knowledge of that topic?  Even with the highest quality performance assessment items, the answer is very little. That’s because there are several factors that interact when a test item is presented to an examinee such as the readability of the item, examinee motivation, or prior knowledge of the item possessed either by cheating (see Caveon for more information about cheaters) or by having a very small set of performance assessment items.

Therefore, it is useful to think about knowledge (or competency, skill, or ability) using the framework of probability. In the absence of any other information, we can infer that the probability of an examinee possessing the knowledge of a specific topic is high if the examinee answers a test item about it correctly. The corollary in corporate training would be, we can infer that the probability of an examinee possessing the skills to perform a specific task is high if the examinee performs the task correctly in an assessment situation.

Now, there are many things that can affect this probability. The fidelity of the item is certainly important. Multiple-choice questions are often bashed by test haters because their fidelity is low making it quite challenging to infer that an examinee knows something about a topic from a single response. Performance assessment items are often praised because they have a much higher level of fidelity thereby increasing the probability.

No matter what type of item is used, the probability will never reach 100%. It is this fact that makes it fundamentally necessary to approach testing from a probabilistic perspective as Item Response Theory (IRT) and Rasch Measurement Theory do. This fact also makes it imperative to collect more information or ask more test questions. Sometimes this means that test questions begin to look quite similar. I remember my 3rd grade math book where the exercises required the same computations over and over again using different numbers. Sometimes, I answered 90% of the items correctly about a topic, having missed a few items due to careless errors. My teacher was able to infer that I understood the topic in this situation. Other times, I did quite poorly, answering very few items correctly. This time the teacher easily inferred my lack of knowledge about the topic. There were some occasions where I answered a good number correctly but still missed some, answering around 75% of the items correctly. Sometimes, my teacher could inspect my responses and identify a trend, such as a specific area where I lacked the knowledge. Other times, she could not. In these situations, I typically possessed the knowledge but applied my knowledge inconsistently due to tiredness, boredom, or carelessness.

In all three of these situations, it should be clear that the information gleaned from one test item is not enough to infer mastery of the topic. This leads us to the next logical question, how many questions should a test maker ask about a specific topic? It doesn’t really matter! What does matter is that you ask enough questions about the entire content domain to make valid inferences about the examinee’s knowledge of the entire subject rather than a specific topic. If you’re really interested in an examinee’s topic knowledge, then you’ll need to build a test for every single topic, which might look a lot like the 3rd grade math test discussed earlier where the test items were very similar. Although this type of test might seem reiteratively, recursively, repetitively, redundant (uh huh!), it will not only succeed in achieving better measurement, it will help the examinees to develop procedural memory. After all, practice makes perfect.

B…