for the love of learning: The Folly of Multiple Choice

Tuesday, May 1, 2012

The Folly of Multiple Choice

"Anyone can confirm how little the grading that results from examinations corresponds to the final useful work of people in real life."

-Jean Piaget

It's final exam time at my school, and my teacher colleagues are collectively herding to the multiple-choice test scoring machine. For just under $800 CAD, our scoring machine can:

Scan up to 35 sheets per minute
Grade up to 100 questions per pass
Score exams with up to 200 questions
PC compatibility for advanced data collection and analysis

The front of the instruction manual proudly reads “GRADING YOUR TESTS JUST GOT EASIER!” After I watched scoring sheet after scoring sheet scurried through the scoring machine, I can personally attest to how easy this really is. It's no secret why multiple choice exams are so popular among teachers – their utility is second to none. But, what are the cons to multiple choice tests? Here are a few items to think about before giving your next multiple choice test:

Ambiguity

Misinterpreting a question can result in an "incorrect" response, even if the response is valid. A free response test allows the test taker to make an argument for their viewpoint and potentially receive credit. Depending on the number of possible answers that are provided, a test taker could have a chance of completely guessing the correct answer. It is conceivable for a student to select the wrong answer for the right reasons or to select the right answer for the wrong reasons. The results of such a multiple-choice exam are surrounded with uncertainty and doubt.

No partial credit

Even if a student has some knowledge of a question, they receive no credit for knowing that information if they select the wrong answer. Free response questions may allow a test taker to demonstrate their understanding of the subject and receive partial credit.
Even carefully constructed exams that reflect very detailed curriculums can be used to improperly assess students. If an exam was created to carefully reflect a certain curriculum, you might see only one question that covers a specific outcome. What if that student did in fact understand that outcome but for any number of reasons, they get the question wrong? That means that this test would report that that student understood nothing of that concept – which most likely would be wholly misleading and untrue. How often can a teacher honestly report that a student understands nothing?

Overemphasis on timeliness

A premium is placed on speed at the cost of creativity and thoroughness. This overemphasis on timeliness also contributes greatly to the ambiguity of the exam. Most test-takers are taught to madly fill in the remaining answers before having their exam taken away by the exam supervisors. There is no way to differentiate between these random guess responses and the responses that were carefully and thoughtfully selected. Recognizing guessing as a problem, some test creators enact a penalty such as deducting a mark for incorrect answers – the hope being that test takers will not guess and instead leave the question blank. This solution may stop the guessing, but it still does not address the ambiguity, as all those unanswered questions will simply show that the test taker got them all wrong – when in truth, the test taker may have had some level of understanding, but because they couldn't get finished in time, or they were too scared to guess, they receive no credit.

Subjectivity

How is the length of the exam decided? How many questions are necessary to show enough understanding? In the case of reading comprehension exams, how many reading selections will there be, and what is an appropriate length? How many answers will there be to select from? Which outcomes will be tested? Which will be excluded? Which will be more heavily weighted?

Depending on the date, the question "how many planets are there in our solar system?" has a different answer. What about all those students who were penalized for excluding Pluto as a planet before 2006?

The point here is not to try and figure out the answer to these questions; rather, there is no one answer for these questions. And yet, the choices made by the test taker can have an immeasurable effect on the test's results. One of my favorite quotes on the subjectivity of testsandgrades comes from Paul Dressel who said, "A mark or grade is an inadequate report of an inaccurate judgement by a biased and variable judge of the extent to which a student has attained an indefinite amount of material.

Behaviouristic in nature

These tests only care about whether the student got the right answer. They can't measure whether the student has a true understanding for the content. Even in a subject such as math that can be (mis)labelled as very black and white and right or wrong, it should very much matter how a student comes to answer the question 2+2=4. Did that student simply memorize his cue cards, or does he actually understand the addition process? A multiple choice test does not and cannot concern itself with understanding such valuable information.

Poor Testing can lead to Poor Teaching

Some teachers may use multiple choice exams voluntarily while others may find their use compulsory. Either way, teachers may feel pressure to achieve high scores on these tests, and that kind of pressure can lead to poor teaching, such as the use of lecturing on the behalf of the teacher and memorization on the behalf of the student. Take math for example, many teachers may teach tricks or shortcuts such as: when dividing two fractions, simply flip the second fraction and multiply. A student could mindlessly comply and perform quite well by choosing the correct multiple choice answer. In cases like this, a poor assessment tool has lead to a poor teaching technique (one that relies on mindless compliance and memorization rather than true understanding); however, if we use the test scores as an indicator for learning, that teacher and student appear successful. Inferences made from multiple choice tests can be undermined leaving the successful a nd superficial students indistinguishable.

Interrater Reliability

Multiple Choice exams are created with one right answer in mind for each question. This straightforward scoring system is used so that any two raters will always agree upon how well a student did. This need for agreement, also known as interrator reliability by statisticians, is gained at an alarming price; Authenticity is sacrificed for (perceived) reliability.

If we were compelled to identify who truly benefits from this kind of artificial measurement, I sincerely doubt anyone could honestly say that this is for the kids. Ultimately, this is an example of the needs of the system trumping the needs of the learner. Alfie Kohn puts it this way:

"You know it's a bad assessment if it's multiple choice. Multiple choice tests can be clever but they can't be authentic. You can't learn what kids know and what they can do with what they know, if they can't generate a response - or at least explain a response. Or as one expert in psychometrics told me many years ago, "Alfie don't you get it, multiple choice tests are designed so lots of students who understand the material will be tricked into picking the wrong response". That's why teachers would never dream of giving a multiple choice test of their own design because the same thing applies there."

Testing Test-taking skills

Multiple choice exams require a certain amount of test taking skills, and some students have better test taking skills than others. Many teachers will actually teach students strategies for writing multiple choice exams. For example, some test takers understand that an answer that has the words "always‟ or "never‟ is usually NOT the correct answer, because rarely is something ever "always‟ or "never‟. This is considered a fairly good strategy, and students who are aware of it may have a better chance of doing well.

However, there are some test takers who have come to believe in poor strategies. For example, some students believe the pattern of responses matters and so they say to themselves, “This can't be another "b‟ answer as we have just had three in a row.” Or they believe in myths such as “when in doubt, pick C”. Granted, we can all probably agree this is a silly strategy, but what if students actually use it? The format of the exam has skewed the measurement of that student's learning.

Averaging Averages

Traditional practice encourages test raters to not only mark each question right or wrong, but to also tally up the number of correct responses and compare that to the total number of questions – of course, we know this to be the average or mean. However, what does this number actually tell us?

Let's pretend there are three questions on the test for every outcome we taught. You could then look at the data and see how many of those three questions a specific student got right or wrong. Let's say for those three questions a student got 1 out 3 correct but for another three questions, that tested a different outcome, the student got 2 of 3 correct. Separately, he understood 33% of the first outcome and 66% of the second outcome. However when you average these averages, he gets 3/6 which comes to a mark of 50%.

What do these numbers mean anymore? Imagine how diluted the average has become when you have 50 to 100 questions that may be measuring the same number of different outcomes. And yet these grades' importance is elevated to grand heights. (Note that the problem of averaging averages is not exclusive to multiple choice exams)

Collaboration = Cheating

Ask any parent for a list of characteristics they wish their children to develop as they grow into adults and there is a very good chance that collaborative skills are somewhere on that list. When you think back to your schooling, how often were you permitted to collaborate with others during examination? If you did try to collaborate, we all know what that was called – cheating! And you got in trouble for it.

Unfortunately, there may some progressive classrooms out there, but it would be a very safe bet to make that most classrooms still have students sitting and writing their exams in isolation. Regardless of your job or profession, how often are you told to figure something out in total and complete isolation – no books, no help, no talking? In the real world, there simply aren't that many times you are expected to solve a problem or perform a task in complete and total isolation – and even if you were, it would be awfully archaic to refuse you the opportunity to reach out for the help you needed to get the task done.

When we say to children, "I want to see what you can do, not what your neighbor can do", this turns out to be code for "I want to see what you can do artificially deprived of the skills and help of the people and resources around you. Rather than seeing how much more you can accomplish in a well functioning team that's more authentic like real life." (Again note that the lack of collaboration during exams is not exclusive to multiple choice exams)

Thinkingcuffs

The very nature of multiple choice tests slaps students with a pair of thinkingcuffs. Who does the majority of the thinking on a multiple choice exam? Who asks all the questions? Who proposes all the answers? Thinking is messy. Learning is messy, but multiple choice tests conveniently remove the mess. All students are required to do is circle or fill in a dot. If we were truly interested in assessing student learning, shouldn't we encourage the students to show us as much of their thinking as possible? Because no one can construct meaning in a preconceived bubble, reducing something as beautiful as learning to a bubble sheet is an exercise in needless oversimplification.

Differentiated Instruction and undifferentiated Assessment

Many teachers today would readily admit that all learners learn differently, and it is the teachers responsibility to address these different learning styles with differentiated instruction; however, many teachers still use multiple choice tests in an attempt to measure their student's learning. There is a real disconnect between our understanding of differentiated instruction and our attempts to measure learning with our undifferentiated, standardized assessment tools.

While it is true that all children should have the opportunity to get an education that does not mean that all children should get the same education. When it comes to instruction and assessment, we need to stop trying to meet the needs of all learners by pretending all learners have the same needs.

Value what we Measure or Measure what we Value

It is true that it makes good sense to occasionally stop and reflect upon how well we are learning – the rest of the time we should concern ourselves with actually learning whatever it is we have set out to learn.

A short anecdote may enlighten this point: A man was seen on his hands and knees searching underneath a street light. It was late at night and very dark. When a passerby inquired what the man was doing, the man said that he was looking for his lost keys. The passerby then noted that the man was fortunate that he had lost his keys under the street light. The man quickly replied that he actually lost his keys a distance to the north, but it was too dark over there, and so he wanted to search where it was easy to see.
There is a big difference between measuring what is simply easily measurable and measuring what we actually consider important. Multiple choice tests measure a very limited and narrow kind of learning. If a great amount of importance is placed on these kinds of tests, people will come to see these limited and narrow kinds of learning as most important – sacrificing their pursuit of other valuable kinds of learning that are rarely measured on multiple choice exams.

While a lot of people concern themselves with what will be on the test, I find myself thinking more about about what can never be on these kinds of tests. Show me the multiple choice test that can assess things like sense of humor, morality, creativity, ingenuity, motivation, empathy.

***

Too many education systems have confused measurement with assessment and forgotten that the latin root for assessment is assidere which translates into "to sit beside". Assessment isn't a spreadsheet -- it's a conversation.

Multiple choice tests were originally tools used by teachers, but today teachers are tools used by multiple choice tests. This shouldn't come as any surprise, especially if you are familiar with some of Marshal McLuhan's work who once said, "We shape our tools and thereafter our tools shape us."

Despite all these reasons for abandoning the use of multiple choice tests, their utility seems to trump their consequences. What's even more discouraging is that many teachers still choose to use multiple choice exams despite having a plethora of more authentic assessment alternatives such as performance assessments, portfolios, written response and personal, two way communications.

Teachers who continue to use multiple choice exams as their primary or default assessment tool are engaging in a kind of educational malpractice because they are reporting on their student's learning in a way that may range from being marginally inaccurate to wholly untruthful.

I asked Irmeli Halinen, head of curriculum in Finland, how often a teacher in Finland would use a multiple choice test as a way of assessing their students. Her answer said it all:

"Our teachers rarely if ever use multiple choice tests because they would rather have their students do something real."

18 comments:

AnonymousMay 1, 2012 at 8:29 AM
Great post, Joe! Your ideas are insightful, thoughtful, and well presented. You are right...it is definitely past time to re-evaluate our assessment practices. "That's the way we've always done it" cannot be the answer to, "Why do you you Multiple Choice tests?"
We have a Physics teacher at our school who has totally moved away from Multiple Choice assessments and he has found that his diploma results have dramatically improved. His comment..."If these kids can do well on my assessments, the multiple choice diploma will be a breeze..."
Thanks for the great piece!

Derek
ReplyDelete
Replies
rborrelliMay 1, 2012 at 6:35 PM
"Our teachers rarely if ever use multiple choice tests because they would rather have their students do something real."

Best line of the post.

Tweeting that.
ReplyDelete
Replies
Mr. FitzMay 1, 2012 at 8:45 PM
Great, great post! This says so many things my wife and I (we're both teachers) have been saying for years. I'll be posting this and Tweeting it... And following your posts!
ReplyDelete
Replies
Jennifer Borgioli BinisMay 2, 2012 at 3:46 AM
Joe - Many of the points you raise speak to challenges with multiple choice assessment and highlight the need for a balanced assessment system. I think you've overlooked the thought that goes into designing quality multiple choice items and equated multiple choice items with tests with grades. However, a point that concerns me deeply is that you seem to suggest there is no such thing as a "right answer". Yes, learning and knowledge is beautiful and messy and big and nebulous. Yes, the majority of things we wrestle with as we establish our personal identities and figure out what it means to be to a member of society are open-ended concepts. Essential Questions such as "What is beauty?", "Who counts as a 'real' Canadian/American?", "Is war inevitable?" - are examples of important questions to wrestle with and hopefully drive students' inquiry. Consider this: all teachers of Participation in Government in the state of NY are obliged by the standards document that inform his or her teaching to ensure their students know their rights as defined by the Bill of Rights. Students need to understand concepts such as freedom, rights, and privacy but they also need to understand which Amendment protects their right to privacy. If the moment comes when those rights need to be exercised, a citizen doesn't have the luxury of saying to an officer, "Excuse me for a moment while I Google my rights." The Standards state explicitly that a student must be able to identify three civic roles associated with being a citizen. You may disagree with standards but to be a teacher in NY (and it's similar across many states and provinces), means an obligation to ensure students' education is aligned to those standards. NYS says "know these 3 things." A well-crafted MC item can help a teacher ascertain if a student knows those 3 things. A well-crafted authentic assessment as a part of the same instructional unit can help students apply their understanding of civil obligation.

Assessment is a conversation - and like conversation, it is comprised of many facets. That is, a conversation is not just two people saying words. It's body language, reflection, active listening, and building on each other's ideas. I've used the analogy before, and I'll use it again here. We trust that our doctor's reading of our blood pressure is just one measure to capturing the messy, big, beautiful concept of health. To suggest that teachers can't handle framing the answers from a multiple choice question within a larger context suggests that we are not professionals. Like the practice of medicine, the practice of teaching is both an art and science, it is a craft informed by the tools the professional choices to use. To rely on only one tool to the exclusion of others is as unwise as rejecting a tool because it's being mis-used in some instances.

Finally, I cannot show you a multiple choice question that assesses "sense of humor, morality, creativity, ingenuity, motivation, empathy" but that's because those are concepts that cannot be assessed through closed-response questions. I also can't tell you how healthy your heart is by looking at your blood sugar level.
ReplyDelete
Replies
UnknownMay 2, 2012 at 12:54 PM
There isn't any information that a multiple choice test could give me about my students that I couldn't get from having students exhibit their understanding through projects that are in a context and for a purpose. And when given the choice between filling in a bubble or actually doing something, I will always choose doing something real.

Because my time, effort and resources as a teacher are strained more than ever, I don't have time for both so I opt out of multiple choice in favor of doing real projects.
ReplyDelete
Replies
UnknownMay 2, 2012 at 4:57 PM
If multiple choice tests are a tool that makes a teacher's life easy while teaching too many kids to many outcomes that are too specific then I can agree to that. But this is hardly something to aspire to.

Multiple choice tests are never a great way to assess what children are capable of -- they're just convenient.
ReplyDelete
Replies
UnknownMay 2, 2012 at 5:50 PM
There might be nothing wrong with convenience as long as full disclosure is given so that "convenience" is not confused with "good for kids".
ReplyDelete
Replies
Jennifer Borgioli BinisMay 2, 2012 at 6:32 PM
Interesting.

It sounds like you're saying that my friend, by using Multiple Choice tests, is doing something that is bad for kids. Am I misinterpreting your comment?
ReplyDelete
Replies
UnknownMay 2, 2012 at 6:50 PM
I don't know your friend. If you want me to judge her teaching skills I'd have to observe her teach. There is no substitute. There is no multiple choice test.

Do you want to make this about your friend or multiple choice test's?
ReplyDelete
Replies
Jennifer Borgioli BinisMay 2, 2012 at 7:01 PM
I'm not asking you to judge her teaching skills. Rather, I presented the context in which she uses multiple choice tests. Rather than talking about their abstract use, I wanted to use a specific, authentic example. What she is doing is convenient. She is able to quick ascertain which students can consistently and correctly able to connect key science concepts to their definitions and meanings.

How is what she doing not "good for kids"?
ReplyDelete
Replies
JonMay 3, 2012 at 12:39 AM
Joe, I stumbled on this by accident and became quite fascinated by how similar our views are on this. I'm not a teacher by profession, but I certainly appreciate all the detailed thoughts you gave on this. Personally I used to loathe multiple choice tests because it always seemed I had a more exact answer than the choices they gave in math. When it was asking questions on a passage I read for reading comprehension, I had issues finding answers that encompassed my understanding of it.

I think at the end of the day, my biggest issue is that by forcing a child/student to adhere to a strict set of predetermined rules and regulations, it diminishes the child/students ability to think both logically and abstractly, it damages their confidence in trying new things, keeps them from learning how to effectively and efficiently defend their "status quo" and find the right answer to larger problems through intelligent discourse.

Anyone can say that 2+2=4 absolutely and indefinitely, but if abstract's are applied 2+2 could equal a range of things. 2 ears + 2 eyes = part of 1 head. Personally I've found people that answer 2+2 with 22 to be pretty smart, lol.

Anyway, thanks for a fantastic blog post.
ReplyDelete
Replies
Thor MayNovember 6, 2012 at 7:37 PM
Multiple choice TESTING can easily be a fool's game. I have taught postgraduates in East Asia who have graduated without doing anything except such tests, and whatever they had learned was of course a fragmented mess. However I am puzzled as to why educators so rarely stand this whole deal on its head. In my experience multiple choice TEACHING/LEARNING (not testing) can be adapted in a Socratic way with great benefit to develop insightful thinking. This presupposes discussion of the choices made. Personally, I have also found m/c LEARNING to be very useful in laying down the first faint memory tracks when learning a foreign language: maybe because it forces a kind of focus and pause while, say, flashcards can become a numbing blur. - Thor May, Australia
ReplyDelete
Replies

Add comment

Pages

Tuesday, May 1, 2012

The Folly of Multiple Choice

18 comments: