I’m done. The school year is over. Yesterday was the day to put grade reports and test results in envelopes, to update permanent records, to put away everything in the classroom and turn in keys. It was a full day. Sweet misery. I hate administrative chores, and I stall out following tangents that most people wouldn’t bother with.

My school bought the STAR reading and math computer-based testing programs a month or so ago and we were supposed to give these tests to our students before the year ended. I did that, and I spent time yesterday learning how to generate and print reports. When I looked at the results, though, they didn’t necessarily match the state Benchmark (SBA) test results. In one case, a student who scored “below proficient” on the SBA, scored at grade level on the STAR. You might think, well, the STAR must be an easier test. But the very next student I looked at scored “proficient” on the SBA and below grade level on the STAR.

What really bugs me about this is that if anyone asked me to predict how those two students would do, I would have said that neither one of them was likely to do very well. So I suppose it’s good news that they at least passed some test. But I know they both have trouble with math.

Standardized testing is sold on the belief that we can compare learning in diverse settings by using a single scientifically calibrated instrument. But if I have two sets of test results, and they’re all over the map, what do I do then? In this case, I’m not comparing learning in diverse settings with a single instrument, I’m comparing learning among different children in the same setting using multiple instruments. And the results seem to say as much about the instruments as they do about the kids. How do I know if I’m looking at a meaningful indicator?

It’s like when the time and temperature signs in front of banks or churches down the block from each other tell us different things. Or the check engine light on my truck, how critical is it to follow up on that when it lights up? Medical screenings give false positives and false negatives, as well. How much faith do I put in any of these? The answer, I think, depends on why I need to know. And this is what I thought about while I was supposed to be focused on getting out of the building. It was an irritated meditation.

People say we should use these tests to tailor our instruction to kids’ needs. But if the tests don’t agree, which results should we use? Each test apparently measures different things, and each must favor some kids over others. But when test scores are reported to the public, the results are all aggregated, and inferences are made about the instructional program, teacher effectiveness, and so forth. The subtle meanings are ignored and lost in the politics of the story. Standardized tests are not very helpful for teachers or kids. They’re political tools.

It would be a great advantage to teachers if we had manageable-sized groups in big-enough rooms, and time to regularly interact with each student and with each other. We get plenty of good information from what we see and hear in the classroom if we’re paying attention to the kids. Using that information is fundamental, so setting up optimal conditions for interventions makes a lot of sense. Instead, we’re focused on diagnosis, data collection, and documentation. Tests are most helpful when we can go over the results with students, and give them a chance to tell us about their answers. Then we all learn something.