Wednesday, March 10, 2010

Rethinking Reliability

In our present day over-dependence on all things testing, teachers, parents and policy makers have developed many misconceptions about what test scores actually tell us. There are perhaps too many to name, and they are certainly too numerous to name here all at once; however, if you go through my blog's archives, you might find a good chunk of them unearthed on a daily basis.

Today I would like you to consider the following anecdote:

Marvin had studied for his grade 8 math test, which was on probability, for over a week. He had placed a significant amount of time and effort into studying the required textbook readings and completed every single practice question that the teacher assigned. Marvin felt quite good about himself. Come exam day, he bombed it. It was a train wreck. He was devastated. 

Marvin's teacher was shocked. She couldn't believe that one of her star students had bombed her test. So, she assumed he simply didn't prepare properly for her exam, and asked him to rewrite the test the next day. 

Marvin accepted but this time he took a different approach to the exam. He decided to relax and simply go into the test with a very calm, yet confident attitude. To Marvin's surprise, his teacher had given him the exact same exam. This time he aced it.
Now honestly, who or what do you place more responsibility on for the variances in Marvin's two scores? Teachers and parents have come to place the responsibility of these kinds of happenings on the students, and rarely, if ever, do we take a step back and re-evaluate our assessments.

In his book Measuring Up: What Educational Testing Really Tells Us, Daniel Koretz explains the importance of understanding reliability:

Another source of inconsistency is the fluctuations over time that would occur even if the items were the same. Students have good and bad days. For example, a student might sleep well before one test date but be too anxious to sleep well another time. Or the examination room may be overheated one time but not the next. Yet another source of measurement error is inconsistencies in the scoring of students' responses.

This is what is meant by reliability. Reliable scores show little inconsistency from one measurement to the next - that is, they contain relatively little measurement error. Reliability is often incorrectly used to mean 'accurate' or 'valid', but it properly refers only to the consistency of measurement. A measure can be reliable but inaccurate - such as a scale that consistently reads too high. We are accustomed to highly reliable measurements in many aspects of our lives: for example when we measure body temperature or the length of a table we are considering buying. Unfortunately, scores on educational tests tend to be much less reliable than these measurements.
If we had a glass full of water and we measured its temperature five times in a row and received five different temperature readings, who or what would you hold responsible for the variances in temperature? In this case, it is unlikely that many people would blame the water. I mean the water didn't do anything different. It had a temperature that the thermometer simply lacked the ability to guage consistently.

So why don't we consider Marvin to be like the glass of water?

Did Marvin's mathematical skills and knowledge change that much from one day to the next? Isn't it more plausible that the test simply lacked the the ability to gauge his understanding of math consistently?

Blaming Marvin for the variances in his test scores would be no less idiotic than blaming the glass of water for not properly being measured.

In the end we should not label Marvin a bad math student; rather we need to identify the test as bad. Or to be more accurate, we need to label the test as unreliable.

No comments:

Post a Comment