I’m done. The school year is over. Yesterday was the day to put grade reports and test results in envelopes, to update permanent records, to put away everything in the classroom and turn in keys. It was a full day. Sweet misery. I hate administrative chores, and I stall out following tangents that most people wouldn’t bother with.
My school bought the STAR reading and math computer-based testing programs a month or so ago and we were supposed to give these tests to our students before the year ended. I did that, and I spent time yesterday learning how to generate and print reports. When I looked at the results, though, they didn’t necessarily match the state Benchmark (SBA) test results. In one case, a student who scored “below proficient” on the SBA, scored at grade level on the STAR. You might think, well, the STAR must be an easier test. But the very next student I looked at scored “proficient” on the SBA and below grade level on the STAR.
What really bugs me about this is that if anyone asked me to predict how those two students would do, I would have said that neither one of them was likely to do very well. So I suppose it’s good news that they at least passed some test. But I know they both have trouble with math.
Standardized testing is sold on the belief that we can compare learning in diverse settings by using a single scientifically calibrated instrument. But if I have two sets of test results, and they’re all over the map, what do I do then? In this case, I’m not comparing learning in diverse settings with a single instrument, I’m comparing learning among different children in the same setting using multiple instruments. And the results seem to say as much about the instruments as they do about the kids. How do I know if I’m looking at a meaningful indicator?
It’s like when the time and temperature signs in front of banks or churches down the block from each other tell us different things. Or the check engine light on my truck, how critical is it to follow up on that when it lights up? Medical screenings give false positives and false negatives, as well. How much faith do I put in any of these? The answer, I think, depends on why I need to know. And this is what I thought about while I was supposed to be focused on getting out of the building. It was an irritated meditation.
People say we should use these tests to tailor our instruction to kids’ needs. But if the tests don’t agree, which results should we use? Each test apparently measures different things, and each must favor some kids over others. But when test scores are reported to the public, the results are all aggregated, and inferences are made about the instructional program, teacher effectiveness, and so forth. The subtle meanings are ignored and lost in the politics of the story. Standardized tests are not very helpful for teachers or kids. They’re political tools.
It would be a great advantage to teachers if we had manageable-sized groups in big-enough rooms, and time to regularly interact with each student and with each other. We get plenty of good information from what we see and hear in the classroom if we’re paying attention to the kids. Using that information is fundamental, so setting up optimal conditions for interventions makes a lot of sense. Instead, we’re focused on diagnosis, data collection, and documentation. Tests are most helpful when we can go over the results with students, and give them a chance to tell us about their answers. Then we all learn something.


8 Comments
We finish our state standardized tests this week (Virginia) and I can’t wait. We won’t see the results until late in the fall, so it is unclear how exactly we should use them for instruction. I calculated this year that 5% of our school days were spent taking standardized tests. That is just for the fifth graders. And it doesn’t include any time spent preparing for tests or the exhaustion that typically follows.
Thank you for such a thoughtful post.
Doug, man does this hit a nerve with me. The apples and oranges state of assessments (in the US), driven by NCLB, is ludicrous. I wish folks would stop calling these things standardized tests. They are not standardized, not nationally normed, not reliable, and in many cases not valid measures. And yet each state tries to “one up” their neighbor state with a tougher, nastier measure.
I’m going to sound really old fashioned and dated here, but give me the nationally normed and standardized tests from our childhoods (they’re still around, but currently shunned) like the ITBS or the CAT. I’m a classroom teacher who used to give those – and move on. Now I watch in horror while kids fall apart taking poorly designed high stakes tests, while we teachers fritter away hour after hour preparing our kids for them, many using the fear of failure as a primary motivator – aaargh.
Better stop now, gotta get ready for tomorrow (and four more weeks!) in the classroom. Enjoy your summer, Doug.
The normative assessments you mentioned were used here, too, for years, and they did help us track student progress. I will say, though, that many primary-level students didn’t understand what was going on. I remember a few second graders cried because I wouldn’t help them with some confusing questions. It’s hard to imagine how you could get a true measure of what kids can do from a test when they aren’t old enough to even know what a test is. I’m not opposed to all testing. But I’d like the data to be useful for something besides scoring political points.
Doug – beyond the ITBS and CRT’s and DRA’s and OSI’s and math and reading formative assessments (and more) that we give at my school, we do the Scholastic Reading Inventory (SRI) 4 times a year. What is interesting about the SRI is that it is easy to administer – it is server based so students take it in the computer lab in 15 to 30 minutes as a whole class – fairly painless. The results are what is interesting. When results are printed out you can choose to see the results from everytime that student has taken the SRI even in past years. So it is always interesting that 2 to 5 or so students will score significantly lower than usual, sometimes 2 or 3 grade levels below their last several scores … um probably this student didn’t lose reading ability … so we share that information with the student – they sometimes have a confused look and other times they share that they were distracted, tired, not feeling great, mad about something that happened at home or recess or ….
We have them retake the test (they don’t read the same passages and aren’t asked the same questions) and guess what???? SURPRISE!!!! 90% of the time their score not only goes up … it is the highest score they ever had. You think it has something to do with focus and being motivated to do their best? What does that have to say about test scores when they only get one chance no matter what has been going on in their lives?
Just a thought.
We have 2 weeks left … but when you are already back … we won’t be : )
Brian
We had AR at our high school, so we bought STAR reading to complement it. It worked best with the SPED kids. As a librarian, I worked really hard with one kid on comprehension–we read together each day during my lunch half hour, and each day we went over what had just happened and what happened in the chapters before. It was a relief for me to find out that he read at a third grade level, because it made sense why he couldn’t comprehend a fourth grade book. I tested myself and my kids (then in 5th and 6th grades) using Star reading, just for fun. We all tested at the highest level possible. I tried to use those scores (along with their other test scores) to convince their teachers that perhaps these two particular kids didn’t really need to fill out a stupid reading log (the bane of a mother’s existence), but of course, that didn’t work–all children must fill out meaningless reading logs because it shows us, the teachers, that children can fill out reading logs and forge their parents’ signatures. Anyway, use the tests if they help, and ignore them if they don’t–oh, wait, with NCLB we can’t exactly do that, can we?
I have been teaching long enough that I was on a committee when we had nationally normed tests. We changed to benchmark tests. I have always used my own test because I didn’t find the benchmark results to be accurate with what I saw.
I just heard that we might add a nationally normed test to the array of tests we are doing. Ugh!
For whom do you collect data? If it’s, as you say, to enable you to work directly with your students on what they need to work on and it becomes a collaborative and constructive exercise, then there’s some purpose to it . But there seems to be a tendency to collect data for education bureaucrats who first use it to justify their positions, then feed it to the politicians, who then engage across jurisdictions in a glorified pissing contest. And sometimes, of course, it’s used to denigrate the public education system.
In Australia, for instance, every primary school kid sits a series of national ‘benchmarking’ tests in numeracy, reading and writing at years 3, 5 and 7 (corresponding to ages 8, 10 and 12) – the Multilevel Assessment program (MAP) tests. To no-one’s surprise, national results show the Northern Territory has a lower percentage of kids reaching or exceeding the benchmark. The data shows that kids in remote schools, in other words Aboriginal kids, are getting the lowest scores of all – say less than 40 per cent achieving the bencmark – and that’s dragging the overall percentage down.
Now it seems to me the first question we should be asking – given that teaching in remote shools is dealing with education across cultures – is whether what the way they are doing that is appropriate. I’m not talking about ‘learning styles’, which is usually a euphemism used by people who parade cultural differences as the reason/excuse for Aboriginal kids not doing so well. What I’m talking about is schools being properly resourced, pedadgogies being appropriate for the context, teachers being properly trained for the task – not just a few Professional Development days here and there – and living and working in conditions that give them some reason for staying, learning and growing, as well as properly educating the kids.
Instead the bureaucrats feed the line that the data also shows poor attendance in these schools (again, we should be asking why). So the answer to the lower percentages is reduced ad absurdam to attendance. If we fix attendance, then we fix the problem.
Then you get a rave about how many dollars are going into remote education, which must be proof of the government’s bona fides. Not much about what the’re actually doing with the dollars. Or why. I think we shouldn’t be emphasising the quantities, unless we can talk qualitatively about the what, how and why. And then do something.
Michael, what you have to say about Australia’s Northern Territory sounds very similar to issues faced by Alaska’s rural schools. I’ve never taught in the bush, and most of what I know about working there comes from stories told by teachers I’ve met in university professional development courses, or who’ve relocated to the “urban” area I work in. Turnover in the bush schools is high. A large percentage are staffed by people recruited from outside the state. Packaged curriculum solutions are increasingly relied on to raise standardized test scores. The best bet, I think, is to encourage local involvement and to recruit as many teachers from rural communities as we can. Outside solutions don’t seem to work very well. As you say, they’re mostly political and don’t address the real problems.
Post a Comment