Daniel John Wilson
Daniel John Wilson

Daniel Wilson

Hi, I’m Daniel, and I am a data junkie.

For the last twelve years of my professional, academic, and even personal life, I have been playing with data for the purpose of analyzing it. Data, as a term, is often used ambiguously to refer to many types of information.

This is the first in a short series of postings where I will discuss some of the issues regarding using data, discussing data, and engineering data solutions, especially as it pertains to the assessment universe. Data can mean many things to many people. In order to understand this confusion, it helps me to root myself in the most basic meaning of the word. The word data translates to mean things given.

This definition of data corresponds to the the systems model (commonly referred to as DIKW for [Data-Information-Knowledge-Wisdom]) where data represents raw input without any relationships or organization.

Data-Information-Knowledge-Wisdom Model
Data-Information-Knowledge-Wisdom Model

DATA-INFORMATION-KNOWLEDGE-WISDOM Model

A data table is a set of data points and their relationships with respect to the primary key that identifies each tuple, or row, of data. But in the DIKW model, a data table like this represents information because the data is now given connections to other pieces of data to provide values for each attribute associated with each row, also called a tuple.

Ideally, a data table should contain:

1. A primary key field that uniquely identifies each tuple in each table.
2. A set of attributes and values associated with the individual tuple.
3. A set of keys from related data tables that allow for easily joining the information across tables.

Thus, a database is a collection of these kinds of data tables and their relationships.

Each data table in an assessment database will likely represent a different level of a hierarchy. In the assessment-verse, the lowest level of data tables in a database is the item response level. In this data table, each row represents a response to a single item within a single test session. Optimally, only information specifically describing a single item response (e.g., response to the item, item score (correct/incorrect), item response time) would be contained in this data table. Information that pertains generically to objects at a higher level within the item response data table should form the basis of additional tables.

Each item record should be linked to a data table that stores item specification information where each row represents an item on an exam. Information specific to the item on the exam (e.g., answer key, mappings to a content area) would then be stored here and not in each item response record. Likewise, each test result record should be linked to a data table that stored test result information where each row represents the result of a test session. This data table would hold records specific to an individual test result (e.g., final score, pass/fail, total time administration).

In our next installment of On the Nature of Data, we will dive deeper into the DIKW model and explore how data normalization is vital to the health of an assessment database. DATA FATA SECULTUS!

-dj