Anatomy of a Test Item
Three days with the Test Item Writer Workgroup in Fairbanks taught me something about test questions. We all know that asking questions and making tests is [part of] what a teacher does. But how many teachers have specific training for question-asking? I never had any. Even if classroom teachers don’t want to think too hard about preparing students for NCLB benchmark tests, they might want to know a little more about the questions they put on their own classroom tests.
Trivial point: Test questions aren’t called questions by the people who specialize in writing them. They’re called ‘items.’ I got used to hearing them called items in the workgroup after a while (as in, “That’s a good item.” Or, “This item needs a little more work.”).
Test items have to pass through several layers of review before they are ever presented to a student. We’ll look at the structure of a test question, and I’ll point out some features of them that students and teachers can be on the lookout for.
- Depth of Knowledge:
Test makers have embraced Webb’s model for depth of knowledge. According to this model, there are four levels of knowledge. There is recall -of facts or procedures; application of skills or concepts -in which information is used to make a decision as part of a multi-step procedure; strategic thinking-which requires reasoning and planning, and may have more than one possible answer; and extended thinking-very hard to assess with a paper and pencil test-requiring an investigation. Extended thinking questions are not asked on typical large scale testing models. - Bias:
This is a very tough criterion for a fair test. The effort to remove bias from tests is so that different individuals with the same ability could be expected to perform similarly. Just about any controversial topic can be considered a risk for bias. Topics that might be off-limits include war, abortion, alcohol, diseases, abuse, sorcery, etc. Bias-prone topics might also include floods, fires, and natural disasters. In Alaska, which covers an immense physical and culturally diverse landscape, there seems no escaping biased question content. - Alignment of Question and Purpose :
The questions we ask in school have specific purposes, and some questions work better than others to get a job done. The purpose a question serves determines how it’s constructed. When a question is meant to prompt a student to identify or recognize something, it begins with the word, “What.” When a question is meant to prompt a student to explain something, it begins with the word, “How.” Questions that begin with “Which” are good for comparing or classifying things. Questions that ask “Why” are difficult to answer. “Why” questions stifle inquiry because the correct responses are narrowed to an indefinite and limited body of information. - Question structure:
There are primarily two kinds of questions, multiple choice, or constructed response. The questions come with what is referred to as a stem or prompt (the question part), the options (the choices) which include the distractors (incorrect responses) and the key (the correct response). Each question also has a rationale that explains why certain choices are either correct or not. The stem should be clear and concise, and should measure only one grade level expectation at a time. The options should all be plausible and should have only ONE correct answer.
What I take from this information is that no matter how carefully a test is constructed, it’s fatally flawed from the beginning because of bias. No matter how carefully the test makers try to make their questions reflect content knowledge, they need to recognize that paper and pencil tests rely heavily on students’ reading ability. To presume that science tests are not testing reading ability is foolish.
