Showing posts with label daniel koretz. Show all posts
Showing posts with label daniel koretz. Show all posts

Tuesday, October 16, 2012

Standardized test scores are like a broken clock

Have you ever heard someone use standardized test scores to judge schools?

The Alberta Government recently released an information bulletin that boasted Alberta student performance results continue to rise:
The overall percentage of students who attained a standard of excellence on Grade 3, 6, and 9 provincial achievement tests (PATs) increased to 20.2 per cent from 19.5 per cent in the previous year. The percentage of students who met the acceptable standard also rose slightly to 75.5 per cent from 75.2 per cent. One of the highlights of the results is the percentage of students who achieved the standard of excellence in Science 6 and Science 9.
Many Albertans might take these standardized test score results as prima facia evidence that things are well. Many Albertans may be satisfied with this information and confidently move on with their regularly scheduled day, thinking that Alberta schools are not only doing well, but they are improving.

What if we are wrong? What if these scores are giving us false confidence? What if standardized test scores aren't telling us what we think they are telling us?

When some Albertans boasted about these results on Twitter, I responded with:
Assessing an education system via standardized test scores is like assessing a car by kicking the tires.
Some challenged me by asking:
Wouldn't the analogy be, "like assessing a car by comparing its gas mileage relative to motor size and tank capacity?"
My response: 

No. 

The assumption made by this analogy is that we think we know what standardized test scores tell us: we assume these scores are our window into the schools -- therefore we assume we can use these scores to judge the quality of teaching and learning that goes on in a school.

But what if these unquestioned assumptions about standardized testing are wrong?

Seth Godin writes:
The worst kind of clock... is a clock that's wrong. Randomly fast or slow. 
If we know exactly how much it's wrong, then it's not so bad. 
If there's no clock, we go seeking the right time. But a wrong clock? We're going to be tempted to accept what it tells us. 
What are you measuring? Keeping track of the wrong data, or reading it wrong is worse than not keeping track at all.
Standardized test scores are like a broken clock because we assume that these scores tell us what we need to know about our schools -- we assume that these scores reflect teaching and learning and therefore assume that if the numbers are rising that must be a good thing.

But what if this is misguided? What if our reliance on standardized tests to judge the quality of the teaching and learning in schools is like relying on a broken clock for time?

Consider this:
  • Standardized test scores are a remarkable way of assessing the socioeconomic status of students and their families. Study after study has shown that out-of-school factors account for an overwhelming proportion of the variances in scores. That means that standardized tests tend to tell us more about what kids bring to school than what they do at school. Here's a Canadian example and an American example.
  • There is research that suggests there is a statistical association between high scores on standardized tests and relatively shallow thinking.
  • Standardized tests tend to measure what is easily measurable, which turns out to be what matters the least. There is a big difference between measuring what we value and valuing what we measure. When we narrow what matters to what can be measured by a standardized test, we fall victim to the MacNamara Fallacy which basically looks like this: (1) Measure whatever can be easily measured on a standardized test. (2) Disregard whatever can't be easily measured or given an arbitrary quantitate value. (3) Presume that what can't be measured easily isn't important. (4) Say what can't be easily measured doesn't even exist.
  • There is research that suggests that when teachers are held accountable for their students' standardized test scores, they tend to become so controlling in their teaching style that the quality of students' performance actually declines.
To fully grasp why this is true, there's a lot to know about the arcane underpinnings of standardized tests; however, testing guru Daniel Koretz gives us a single principle that summarizes what we need to know:
Never treat a test score as a synonym for what children have learned or what teachers have taught.
Again, this too can be true for lots of reasons, but Alfie Kohn gives us a single principle that summarizes what we need to know:
A right answer on a test does not necessarily indicate understanding and a wrong answer does not necessarily indicate a lack of understanding.
I would imagine there are times when standardized test scores might reflect the teaching and learning that goes on in a school, but remember, even a broken clock is right twice a day.

Standardized tests look good from afar but are far from good at reflecting what matters most when it comes to teaching and learning. The closer you look at standardized tests, the more you realize that their utility and convenience comes at an alarming and unacceptable cost. Ask yourself if what we're learning from standardized tests is worth the price.

I would rather no information - no data - nothing! than the grossly misleading and misused data that is extracted from standardized testing. As long as the public is fed standardized test scores, we will be tempted to accept what they tell us -- but if the public had no information about their schools, they would be forced to seek it out which might lead more people to actually step foot in their local schools.

Wednesday, September 5, 2012

Want high test scores? Buy an expensive house!


In his book Measuring Up: What Educational Testing Really Tells Us, Daniel Koretz writes:
Several years ago, I received a phone call from a total stranger who was about to move into my school district and wanted me to help her identify good schools. She assumed that because of what I do for a ligivng, I ought to know this. I took her question more seriously than she wanted and told her briefly what I would look for, not only as an expert in testing and educational research but aslo as a parent of two children and a former elementary and middle-school teacher. As a first step, I suggested, she should gather as much descriptive information as she could readily obtain to get a notion of which schools she might want to consider. Test scores would be high on my list of descriptive information, but many other things might be important as well, depending on the child: the strength of the school's music or athletic programs, some special curricular emphasis, school size, social heterogeneity, and so on. Then, once she had narrowed down her list far enough (this was a very large district), I said she should visit a few schools that looked promising. A visit would allow her to get a glimpse of the characteristics of the schools, including those that might help account for their test scores. I explained some of the things that I had looked for when I had checked out schools and classrooms for my own children -- for example, a high level of student engagement, clear explanations from teachers before students undertook tasks, a level of enthusiastic activity when it was appropriate, and spirited discussion among the students. With both the observations and descriptive information in hand, she would be better able to identify schools that would be a good match for her children. 
She was not pleased. She clearly wanted an answer that was uncomplicated and that would entail less work, or at least less ambiguity and complexity. A simple answer is reassuring, especially when both your children's education and a very large amount of money are at stake. (this was in Bethesda, Maryland, where housing prices were outrageously high.) 
A few weeks later, I mentioned this conversation to a friend who at the time ran a large testing program. He repliced that he received calls of that sort all the time and that few callers wanted his answers either. They wanted something simpler: the names of the schools with the highest test scores, which the callers considered enough to identify the best schools. He told me that in one conversation he hand finally lost his patience when the caller resisted a more reasonable explanation and had told her, "If all you want is high average test scores, tell your realtor that you want to buy into the highest-income neighbourhood you can manage. That will buy the highest average test score you can afford." 
The home buyer's phone call reflected two misunderstandings of achievement testing: that scores on a single test tell us all we need to know about student achievement, and that this information tells us all we need to know about the school quality.

Thursday, August 23, 2012

What do standardized test scores tell us?

Today I am going to continue my critique of the Fraser Institute's Report Card on Alberta's High Schools for 2011.

Consider this chart that I created based on information from the Fraser Report:


Here are some interesting details:
  • 5 out of the top 20 schools have reported 0% special needs with the highest being 19%. Every single school in the bottom 20 reported a special needs population with the least being 4.9% and the most being 100%.
  • 12 out of the top 20 schools have an average parent income over $100,000 and 6 of them were over $200,000. In the bottom 20, not one school has an average parent income over $100,000 while half are below $60,000.
  • There are outliers. Bawlf is the only school in the top 20 with an average parent income below $50,000, and there are three schools in the bottom 20 who have an average parent income over $90,000; however, two of those three schools report that over 20% of their population is special needs.
Those in favor of ranking schools via their standardized test scores like to say that it provides parents with the information they need to choose a school for their children. At first glance this looks like it makes a lot of sense -- many people see standardized test scores as the public's window into the quality of our schools. But what if standardized test scores aren't telling us what we think they are telling us? What if standardized test scores tell us less about in-school factors and more about out-of-school factors? In fact, this is exactly the case. Socio-economic status is by far the strongest predictor of student performance on standardized tests.

In Alfie Kohn's book The Case Against Standardized Testing, Kohn explains what standardized testing  really tells us:
The main thing they tell us is how big the students' houses are. Research has repeatedly found that the amount of poverty in the communities where schools are located, along with other variables having nothing to do with what happens in classrooms, accounts for the great majority of the difference in test scores from one area to the next. To that extent, tests are simply not a valid measure of school effectiveness. (Indeed, one educator suggested that we could save everyone a lot of time and money by eliminating standardized tests and just asking a single question: "How much money does your mom make? ... OK, you're on the bottom.") Only someone ignorant or dishonest would present a ranking of schools' test results as though it told us about the quality of teaching that went on in those schools when, in fact, it primarily tells us about socio-economic status and available resources. Of course, knowing what really determines the score makes it impossible to defend the practice of using them as the basis for high-stakes decisions.
When some hear the argument that poverty matters, they like to declare that poverty isn't destiny and that socio-economic status isn't everything. Some will say that within a given school, a group of students of the same status will have variations in the scores. To this Kohn replies:
Sure. And among people who smoke three packs of cigarettes a day, there are going to be variations in lung cancer rates. but that doesn't change the fact that smoking is the factor most powerfully associated with lung cancer.
In Edmonton, Todd Rogers from the University of Alberta conducted research on the variables that affect student performance on Alberta's Provincial Achievement Tests. Rogers found that "by far, the strongest predictor of student performance on achievement tests is socio-economic status (SES)."

In Calgary, Hugh Lytton and Michael Pyryt came to similar conclusions: "Social class factors explain about 45 per cent of the variation in achievement test results. The correlation between income level and achievement test scores is very strong."

Both studies were summarized by the Alberta Teachers' Association News in 1997.

In his book Measuring Up: What Educational Testing Really Tells Us, Daniel Koretz writes about a friend of his that ran a large testing program who often received calls from parents asking him for how they could use standardized test scores to select the best school for their children. Often these phone calls were disappointing for parents because they wanted a method that was simple and free from ambiguity and complexity. Koretz's friend shared an example of when a parent simply wanted a list of the schools with the highest test scores. After trying to explain that test scores shouldn't really be used that way, Koretz's friend lost his patience and told the parent, "If all you want is high average test-scores, tell your realtor that you want to buy into the highest-income neighbourhood you can manage. That will buy you the highest average score you can afford."

Real accountability is about transparency but there is nothing transparent about how standardized testing reduces learning to the convenience of a number or a rank. We are mistakenly led to believe that standardized test scores tell us about school quality when really it is an echo-chamber for affluence and opportunity. Mark Twain may have summarized all this up nicely when he said:
It ain't what you don't know that gets you in trouble. It's what you know for sure that just ain't so.

Saturday, September 24, 2011

Test Score Ambiguity

The problem with using test scores to tell us something about a student, teacher, school or community is that there are far too many variables.

Here's a question from the book The Myths of Standardized Tests:

If 85 percent or more the students in your child's classroom or school meet or exceed the proficiency standards, that means: 
a) your child has an exemplary teacher.
b) your school has an exemplary principal.
c) both a and b.
d) your school community is wealthier than average.
e) all, any combination, or none of the above.
While it may be true that inside the classroom the quality of the teacher has the greatest influence on student learning, the rest of the world outside of the classroom is much larger -- which is why testing experts like Harvard's Daniel Koretz warn:
A great many things other than the quality of schools influence educational achievement, and the impact of these noneducational factors can be huge... 
People routinely misinterpret differences in test scores, commonly attributing more to quality of education than they ought... 
Trends in scores over time, whether down or up, are often influenced by social factors and, in the case of seeming improvements, by inappropriate teaching to the test. Not all low scoring schools offer as weak an educational program as their scores might suggest. By the same token, if your neighborhood schools have high scores, that may mean less about the quality of their programs than you'd like. 
 The point to be taken here is that when we ask tests to be a window into the quality and quantity of student learning AND an educator's teaching, we are asking test scores to do something they can never do.

Thursday, September 15, 2011

Daniel Koretz on Testing



Learning about what's under the hood of standardized tests will never be confused with something fun. And yet, this is precisely why we need people like Daniel Koretz who can make the arcane underpinnings of standardized testing more accessible to the masses.

If you want to broaden your current understanding for standardized tests, I suggest you get a copy of Koretz's book Measuring Up: What Educational Testing Really Tells Us.

Here are is a post on Rethinking Reliability that highlights some of Koretz's work.


Wednesday, March 10, 2010

Rethinking Reliability

In our present day over-dependence on all things testing, teachers, parents and policy makers have developed many misconceptions about what test scores actually tell us. There are perhaps too many to name, and they are certainly too numerous to name here all at once; however, if you go through my blog's archives, you might find a good chunk of them unearthed on a daily basis.

Today I would like you to consider the following anecdote:

Marvin had studied for his grade 8 math test, which was on probability, for over a week. He had placed a significant amount of time and effort into studying the required textbook readings and completed every single practice question that the teacher assigned. Marvin felt quite good about himself. Come exam day, he bombed it. It was a train wreck. He was devastated. 

Marvin's teacher was shocked. She couldn't believe that one of her star students had bombed her test. So, she assumed he simply didn't prepare properly for her exam, and asked him to rewrite the test the next day. 

Marvin accepted but this time he took a different approach to the exam. He decided to relax and simply go into the test with a very calm, yet confident attitude. To Marvin's surprise, his teacher had given him the exact same exam. This time he aced it.
Now honestly, who or what do you place more responsibility on for the variances in Marvin's two scores? Teachers and parents have come to place the responsibility of these kinds of happenings on the students, and rarely, if ever, do we take a step back and re-evaluate our assessments.

In his book Measuring Up: What Educational Testing Really Tells Us, Daniel Koretz explains the importance of understanding reliability:

Another source of inconsistency is the fluctuations over time that would occur even if the items were the same. Students have good and bad days. For example, a student might sleep well before one test date but be too anxious to sleep well another time. Or the examination room may be overheated one time but not the next. Yet another source of measurement error is inconsistencies in the scoring of students' responses.

This is what is meant by reliability. Reliable scores show little inconsistency from one measurement to the next - that is, they contain relatively little measurement error. Reliability is often incorrectly used to mean 'accurate' or 'valid', but it properly refers only to the consistency of measurement. A measure can be reliable but inaccurate - such as a scale that consistently reads too high. We are accustomed to highly reliable measurements in many aspects of our lives: for example when we measure body temperature or the length of a table we are considering buying. Unfortunately, scores on educational tests tend to be much less reliable than these measurements.
If we had a glass full of water and we measured its temperature five times in a row and received five different temperature readings, who or what would you hold responsible for the variances in temperature? In this case, it is unlikely that many people would blame the water. I mean the water didn't do anything different. It had a temperature that the thermometer simply lacked the ability to guage consistently.

So why don't we consider Marvin to be like the glass of water?

Did Marvin's mathematical skills and knowledge change that much from one day to the next? Isn't it more plausible that the test simply lacked the the ability to gauge his understanding of math consistently?

Blaming Marvin for the variances in his test scores would be no less idiotic than blaming the glass of water for not properly being measured.

In the end we should not label Marvin a bad math student; rather we need to identify the test as bad. Or to be more accurate, we need to label the test as unreliable.