Tuesday, May 1, 2012

The Folly of Multiple Choice

"Anyone can confirm how little the grading that results from examinations corresponds to the final useful work of people in real life."

-Jean Piaget


It's final exam time at my school, and my teacher colleagues are collectively herding to the multiple-choice test scoring machine. For just under $800 CAD, our scoring machine can:
  • Scan up to 35 sheets per minute 
  • Grade up to 100 questions per pass 
  • Score exams with up to 200 questions 
  • PC compatibility for advanced data collection and analysis
The front of the instruction manual proudly reads “GRADING YOUR TESTS JUST GOT EASIER!” After I watched scoring sheet after scoring sheet scurried through the scoring machine, I can personally attest to how easy this really is. It's no secret why multiple choice exams are so popular among teachers – their utility is second to none. But, what are the cons to multiple choice tests? Here are a few items to think about before giving your next multiple choice test:


Ambiguity

Misinterpreting a question can result in an "incorrect" response, even if the response is valid. A free response test allows the test taker to make an argument for their viewpoint and potentially receive credit. Depending on the number of possible answers that are provided, a test taker could have a chance of completely guessing the correct answer. It is conceivable for a student to select the wrong answer for the right reasons or to select the right answer for the wrong reasons. The results of such a multiple-choice exam are surrounded with uncertainty and doubt.


No partial credit

Even if a student has some knowledge of a question, they receive no credit for knowing that information if they select the wrong answer. Free response questions may allow a test taker to demonstrate their understanding of the subject and receive partial credit.
Even carefully constructed exams that reflect very detailed curriculums can be used to improperly assess students. If an exam was created to carefully reflect a certain curriculum, you might see only one question that covers a specific outcome. What if that student did in fact understand that outcome but for any number of reasons, they get the question wrong? That means that this test would report that that student understood nothing of that concept – which most likely would be wholly misleading and untrue. How often can a teacher honestly report that a student understands nothing?

Overemphasis on timeliness

A premium is placed on speed at the cost of creativity and thoroughness. This overemphasis on timeliness also contributes greatly to the ambiguity of the exam. Most test-takers are taught to madly fill in the remaining answers before having their exam taken away by the exam supervisors. There is no way to differentiate between these random guess responses and the responses that were carefully and thoughtfully selected. Recognizing guessing as a problem, some test creators enact a penalty such as deducting a mark for incorrect answers – the hope being that test takers will not guess and instead leave the question blank. This solution may stop the guessing, but it still does not address the ambiguity, as all those unanswered questions will simply show that the test taker got them all wrong – when in truth, the test taker may have had some level of understanding, but because they couldn't get finished in time, or they were too scared to guess, they receive no credit.


Subjectivity

How is the length of the exam decided? How many questions are necessary to show enough understanding? In the case of reading comprehension exams, how many reading selections will there be, and what is an appropriate length? How many answers will there be to select from? Which outcomes will be tested? Which will be excluded? Which will be more heavily weighted?

Depending on the date, the question "how many planets are there in our solar system?" has a different answer. What about all those students who were penalized for excluding Pluto as a planet before 2006?

The point here is not to try and figure out the answer to these questions; rather, there is no one answer for these questions. And yet, the choices made by the test taker can have an immeasurable effect on the test's results. One of my favorite quotes on the subjectivity of testsandgrades comes from Paul Dressel who said, "A mark or grade is an inadequate report of an inaccurate judgement by a biased and variable judge of the extent to which a student has attained an indefinite amount of material.


Behaviouristic in nature

These tests only care about whether the student got the right answer. They can't measure whether the student has a true understanding for the content. Even in a subject such as math that can be (mis)labelled as very black and white and right or wrong, it should very much matter how a student comes to answer the question 2+2=4. Did that student simply memorize his cue cards, or does he actually understand the addition process? A multiple choice test does not and cannot concern itself with understanding such valuable information.




Poor Testing can lead to Poor Teaching

Some teachers may use multiple choice exams voluntarily while others may find their use compulsory. Either way, teachers may feel pressure to achieve high scores on these tests, and that kind of pressure can lead to poor teaching, such as the use of lecturing on the behalf of the teacher and memorization on the behalf of the student. Take math for example, many teachers may teach tricks or shortcuts such as: when dividing two fractions, simply flip the second fraction and multiply. A student could mindlessly comply and perform quite well by choosing the correct multiple choice answer. In cases like this, a poor assessment tool has lead to a poor teaching technique (one that relies on mindless compliance and memorization rather than true understanding); however, if we use the test scores as an indicator for learning, that teacher and student appear successful. Inferences made from multiple choice tests can be undermined leaving the successful and superficial students indistinguishable.


Interrater Reliability

Multiple Choice exams are created with one right answer in mind for each question. This straightforward scoring system is used so that any two raters will always agree upon how well a student did. This need for agreement, also known as interrator reliability by statisticians, is gained at an alarming price; Authenticity is sacrificed for (perceived) reliability.

If we were compelled to identify who truly benefits from this kind of artificial measurement, I sincerely doubt anyone could honestly say that this is for the kids. Ultimately, this is an example of the needs of the system trumping the needs of the learner. Alfie Kohn puts it this way:
"You know it's a bad assessment if it's multiple choice. Multiple choice tests can be clever but they can't be authentic. You can't learn what kids know and what they can do with what they know, if they can't generate a response - or at least explain a response. Or as one expert in psychometrics told me many years ago, "Alfie don't you get it, multiple choice tests are designed so lots of students who understand the material will be tricked into picking the wrong response". That's why teachers would never dream of giving a multiple choice test of their own design because the same thing applies there."

Testing Test-taking skills

Multiple choice exams require a certain amount of test taking skills, and some students have better test taking skills than others. Many teachers will actually teach students strategies for writing multiple choice exams. For example, some test takers understand that an answer that has the words "always‟ or "never‟ is usually NOT the correct answer, because rarely is something ever "always‟ or "never‟. This is considered a fairly good strategy, and students who are aware of it may have a better chance of doing well.

However, there are some test takers who have come to believe in poor strategies. For example, some students believe the pattern of responses matters and so they say to themselves, “This can't be another "b‟ answer as we have just had three in a row.” Or they believe in myths such as “when in doubt, pick C”. Granted, we can all probably agree this is a silly strategy, but what if students actually use it? The format of the exam has skewed the measurement of that student's learning.


Averaging Averages

Traditional practice encourages test raters to not only mark each question right or wrong, but to also tally up the number of correct responses and compare that to the total number of questions – of course, we know this to be the average or mean. However, what does this number actually tell us?

Let's pretend there are three questions on the test for every outcome we taught. You could then look at the data and see how many of those three questions a specific student got right or wrong. Let's say for those three questions a student got 1 out 3 correct but for another three questions, that tested a different outcome, the student got 2 of 3 correct. Separately, he understood 33% of the first outcome and 66% of the second outcome. However when you average these averages, he gets 3/6 which comes to a mark of 50%.

What do these numbers mean anymore? Imagine how diluted the average has become when you have 50 to 100 questions that may be measuring the same number of different outcomes. And yet these grades' importance is elevated to grand heights. (Note that the problem of averaging averages is not exclusive to multiple choice exams)





Collaboration = Cheating

Ask any parent for a list of characteristics they wish their children to develop as they grow into adults and there is a very good chance that collaborative skills are somewhere on that list. When you think back to your schooling, how often were you permitted to collaborate with others during examination? If you did try to collaborate, we all know what that was called – cheating! And you got in trouble for it.

Unfortunately, there may some progressive classrooms out there, but it would be a very safe bet to make that most classrooms still have students sitting and writing their exams in isolation. Regardless of your job or profession, how often are you told to figure something out in total and complete isolation – no books, no help, no talking? In the real world, there simply aren't that many times you are expected to solve a problem or perform a task in complete and total isolation – and even if you were, it would be awfully archaic to refuse you the opportunity to reach out for the help you needed to get the task done.

When we say to children, "I want to see what you can do, not what your neighbor can do", this turns out to be code for "I want to see what you can do artificially deprived of the skills and help of the people and resources around you. Rather than seeing how much more you can accomplish in a well functioning team that's more authentic like real life." (Again note that the lack of collaboration during exams is not exclusive to multiple choice exams)


Thinkingcuffs

The very nature of multiple choice tests slaps students with a pair of thinkingcuffs. Who does the majority of the thinking on a multiple choice exam? Who asks all the questions? Who proposes all the answers? Thinking is messy. Learning is messy, but multiple choice tests conveniently remove the mess. All students are required to do is circle or fill in a dot. If we were truly interested in assessing student learning, shouldn't we encourage the students to show us as much of their thinking as possible? Because no one can construct meaning in a preconceived bubble, reducing something as beautiful as learning to a bubble sheet is an exercise in needless oversimplification.



Differentiated Instruction and undifferentiated Assessment

Many teachers today would readily admit that all learners learn differently, and it is the teachers responsibility to address these different learning styles with differentiated instruction; however, many teachers still use multiple choice tests in an attempt to measure their student's learning. There is a real disconnect between our understanding of differentiated instruction and our attempts to measure learning with our undifferentiated, standardized assessment tools.

While it is true that all children should have the opportunity to get an education that does not mean that all children should get the same education. When it comes to instruction and assessment, we need to stop trying to meet the needs of all learners by pretending all learners have the same needs.


Value what we Measure or Measure what we Value

It is true that it makes good sense to occasionally stop and reflect upon how well we are learning – the rest of the time we should concern ourselves with actually learning whatever it is we have set out to learn.
A short anecdote may enlighten this point: A man was seen on his hands and knees searching underneath a street light. It was late at night and very dark. When a passerby inquired what the man was doing, the man said that he was looking for his lost keys. The passerby then noted that the man was fortunate that he had lost his keys under the street light. The man quickly replied that he actually lost his keys a distance to the north, but it was too dark over there, and so he wanted to search where it was easy to see.
There is a big difference between measuring what is simply easily measurable and measuring what we actually consider important. Multiple choice tests measure a very limited and narrow kind of learning. If a great amount of importance is placed on these kinds of tests, people will come to see these limited and narrow kinds of learning as most important – sacrificing their pursuit of other valuable kinds of learning that are rarely measured on multiple choice exams.


While a lot of people concern themselves with what will be on the test, I find myself thinking more about about what can never be on these kinds of tests. Show me the multiple choice test that can assess things like sense of humor, morality, creativity, ingenuity, motivation, empathy.




***


Too many education systems have confused measurement with assessment and forgotten that the latin root for assessment is assidere which translates into "to sit beside". Assessment isn't a spreadsheet -- it's a conversation. 

Multiple choice tests were originally tools used by teachers, but today teachers are tools used by multiple choice tests. This shouldn't come as any surprise, especially if you are familiar with some of Marshal McLuhan's work who once said, "We shape our tools and thereafter our tools shape us."

Despite all these reasons for abandoning the use of multiple choice tests, their utility seems to trump their consequences. What's even more discouraging is that many teachers still choose to use multiple choice exams despite having a plethora of more authentic assessment alternatives such as performance assessments, portfolios, written response and personal, two way communications.

Teachers who continue to use multiple choice exams as their primary or default assessment tool are engaging in a kind of educational malpractice because they are reporting on their student's learning in a way that may range from being marginally inaccurate to wholly untruthful.

I asked Irmeli Halinen, head of curriculum in Finland, how often a teacher in Finland would use a multiple choice test as a way of assessing their students. Her answer said it all:
"Our teachers rarely if ever use multiple choice tests because they would rather have their students do something real."



18 comments:

  1. Great post, Joe! Your ideas are insightful, thoughtful, and well presented. You are right...it is definitely past time to re-evaluate our assessment practices. "That's the way we've always done it" cannot be the answer to, "Why do you you Multiple Choice tests?"
    We have a Physics teacher at our school who has totally moved away from Multiple Choice assessments and he has found that his diploma results have dramatically improved. His comment..."If these kids can do well on my assessments, the multiple choice diploma will be a breeze..."
    Thanks for the great piece!

    Derek

    ReplyDelete
  2. "Our teachers rarely if ever use multiple choice tests because they would rather have their students do something real."

    Best line of the post.

    Tweeting that.

    ReplyDelete
  3. Great, great post! This says so many things my wife and I (we're both teachers) have been saying for years. I'll be posting this and Tweeting it... And following your posts!

    ReplyDelete
  4. Joe - Many of the points you raise speak to challenges with multiple choice assessment and highlight the need for a balanced assessment system. I think you've overlooked the thought that goes into designing quality multiple choice items and equated multiple choice items with tests with grades. However, a point that concerns me deeply is that you seem to suggest there is no such thing as a "right answer". Yes, learning and knowledge is beautiful and messy and big and nebulous. Yes, the majority of things we wrestle with as we establish our personal identities and figure out what it means to be to a member of society are open-ended concepts. Essential Questions such as "What is beauty?", "Who counts as a 'real' Canadian/American?", "Is war inevitable?" - are examples of important questions to wrestle with and hopefully drive students' inquiry. Consider this: all teachers of Participation in Government in the state of NY are obliged by the standards document that inform his or her teaching to ensure their students know their rights as defined by the Bill of Rights. Students need to understand concepts such as freedom, rights, and privacy but they also need to understand which Amendment protects their right to privacy. If the moment comes when those rights need to be exercised, a citizen doesn't have the luxury of saying to an officer, "Excuse me for a moment while I Google my rights." The Standards state explicitly that a student must be able to identify three civic roles associated with being a citizen. You may disagree with standards but to be a teacher in NY (and it's similar across many states and provinces), means an obligation to ensure students' education is aligned to those standards. NYS says "know these 3 things." A well-crafted MC item can help a teacher ascertain if a student knows those 3 things. A well-crafted authentic assessment as a part of the same instructional unit can help students apply their understanding of civil obligation.

    Assessment is a conversation - and like conversation, it is comprised of many facets. That is, a conversation is not just two people saying words. It's body language, reflection, active listening, and building on each other's ideas. I've used the analogy before, and I'll use it again here. We trust that our doctor's reading of our blood pressure is just one measure to capturing the messy, big, beautiful concept of health. To suggest that teachers can't handle framing the answers from a multiple choice question within a larger context suggests that we are not professionals. Like the practice of medicine, the practice of teaching is both an art and science, it is a craft informed by the tools the professional choices to use. To rely on only one tool to the exclusion of others is as unwise as rejecting a tool because it's being mis-used in some instances.

    Finally, I cannot show you a multiple choice question that assesses "sense of humor, morality, creativity, ingenuity, motivation, empathy" but that's because those are concepts that cannot be assessed through closed-response questions. I also can't tell you how healthy your heart is by looking at your blood sugar level.

    ReplyDelete
    Replies
    1. Jennifer,

      A thoughtful response to this post, and so perhaps a thought of my own regarding your comment:

      "NYS says "know these 3 things." A well-crafted MC item can help a teacher ascertain if a student knows those 3 things. "

      It would be good to point out that "knowing" exists on a continuum from shallow to deep. A well-crafted MC item assesses shallow forms of knowing and very little in regards to deep knowing. Rather than this being about "artistic or scientific" forms of assessment, I think it is about "shallow versus deep" forms of assessment.

      I agree that quantifiable data is very important in assessment, but MC items are not comprehensive in the data they provide. A well-crafted MC item shows that a student can memorize and regurgitate 3 civic roles, but reveals no other data in regards to the way the student has integrated knowledge into a larger world concept, or connected it to other learnt material etc.

      To me, requiring students to "Identify 3 civic roles" to ascertain if meaningful learning has occurred, is akin to asking a person eating in a restaurant to identify 3 pieces of food from their meal, as a means to ascertain if a meaningful dining experience occurred.

      It is a crude metaphor for sure, but I think it illustrates that there is not a causal relationship between the identification of 3 civic roles to deep knowing.

      Just some "food for thought" (pun intended). Your comment in the context of this post really got me thinking, and therefore much appreciated :)

      Delete
    2. Brazen - You've raising a compelling point; to borrow your phrase, does learning only count if it's deep? That is, why is learning not worthy of being assessed and documented if it's shallow? One of the first things we teach children to memorize is their name. Knowing the particular combination of letters that counts as our name is neither deep nor meaningful. But it is significant. The point being that shallow knowledge has as much a place in our experiences as does deep.

      I don't think your dinning analogy is crude, rather, I'd turn it a few degrees to the left. That is, I see it like asking a dinner to identify three sauces they see on the menu. Deep? Nope. Meaningful? Nope. Knowledge? Yup. I think what's important to note is that I would never advocate for MC to be used to assess the meaningfulness of their meal. It's not a good fit between learning target and assessment task. However, if my learning target says "the diner will be able to identify 3 sauces on the menu", I'm not going to subject him or her to a writing a 5-paragraph review. (and if the issue with the learning target, that's a standards discussion, not assessment) Not all learning is applicable, not all learning has an immediate purpose. Some times we learn things just to learn them and sometimes, MC is the best means to assess the mastery of that learning.

      Thanks as well for the food for thought and I look forward to the next course.

      Delete
    3. niiiiice tweak on my metaphor :)

      I would bring in what Joe said below:

      "There isn't any information that a multiple choice test could give me about my students that I couldn't get from having students exhibit their understanding through projects that are in a context and for a purpose."

      To clarify my comment further: many forms of knowing (such as "shallow" and "deep"... rudimentary terms I invented for this discussion, and not academically derived in the least) can ALL be assessed using real activities as Joe mentions, whereas MC only assesses very surface levels of knowing. For this reason I personally would choose against using any MC. That being said... I am an art teacher, and no one gives a sh** how I grade anyway ;) (although I wish they would)

      Delete
  5. There isn't any information that a multiple choice test could give me about my students that I couldn't get from having students exhibit their understanding through projects that are in a context and for a purpose. And when given the choice between filling in a bubble or actually doing something, I will always choose doing something real.

    Because my time, effort and resources as a teacher are strained more than ever, I don't have time for both so I opt out of multiple choice in favor of doing real projects.

    ReplyDelete
    Replies
    1. Joe - I am thrilled for you. I am thrilled that you have arrived at a place where 100% of your learning targets can be assessed through authentic assessments. I really am glad for you and other educators who find themselves in such a position. Perhaps there's even a bit of envy. However, it seems unkind to announce that fellow professionals should abandon an established tool, seemingly based on the argument that you don't need it, so why should they?

      Please consider the following: A chemistry teacher (a well-respect, well-liked woman) I know sees every Junior, and Senior who failed the course the year before, for a Chemistry course. That's almost 90 students. According her content standards, all of her students need to be able to identify the difference between "density" and "particle size". All 90 students take a chem lab. All 90 students engage in authentic assessment where they explore the essential question: "Are laws always considered truths?" All 90 of her students take a test in the midst of that unit with 15 multiple choice questions that requires them to correctly identify the meaning or concept behind different, yet seemingly similar, chemistry terms and concepts. While she could have gone through 90 science labs to ensure all of her students correctly used the terminology correctly, she elects to focus the time she spends reading labs on giving quality feedback on their processes. Her standards say her students need to do x, she assess and documents their ability to do x.

      Perhaps she has too many students. Perhaps her standards are too specific. But, she has the students she has and her standards are what her standards are. Well-designed multiple choice questions can provide information about student learning. And while I recognize we are both entrenched in our views, a cursory view of the assessment texts I have in arms reach (Nitko, Brookhart, Jaeger, VanBlerkom, Stiggins, etc.) all speak to caution of using multiple choice and recognizing when they are appropriate and inappropriate. I was unable to locate any research or literature against them - beside Mr. Kohn's warnings that you cited. Fairtest.org openly addresses what they can and cannot do: http://www.fairtest.org/facts/mctfcat.html

      Again, Joe - I'm thrilled for you. I am also thrilled for my friend the chemistry teacher. She has found a balance between authentic assessment, recall tasks, and scientific inquiry. When she asks me to give her feedback on her tests so they can be the best tests possible, I do so openly and without hesitation using the guidelines established by the 1999 Testing Standards.

      Finally, I'm intrigued by this concept of "real" - "real learning", "doing something real". What makes something "real"? Is it as Brazen suggests and is the difference between shallow and deep?

      Delete
    2. Quick clarification - and by her standards say x, she asks them to x - I meant that in order to be scientifically literate, a learner must use relevant terms correctly. So, in truth, students would be asked to use and learn precise language, even if it wasn't in the standards.

      Delete
  6. If multiple choice tests are a tool that makes a teacher's life easy while teaching too many kids to many outcomes that are too specific then I can agree to that. But this is hardly something to aspire to.

    Multiple choice tests are never a great way to assess what children are capable of -- they're just convenient.

    ReplyDelete
    Replies
    1. Sorry, Joe, but I'm still struggling to see an argument for why MC items are not a good tool for assessing students' understanding of lower level Bloom's. (Full stop. Not abuse. Not test-taking strategies. Why is what my friend doing such a terrible thing?)

      And why is convenience a bad thing?

      Delete
  7. There might be nothing wrong with convenience as long as full disclosure is given so that "convenience" is not confused with "good for kids".

    ReplyDelete
  8. Interesting.

    It sounds like you're saying that my friend, by using Multiple Choice tests, is doing something that is bad for kids. Am I misinterpreting your comment?

    ReplyDelete
  9. I don't know your friend. If you want me to judge her teaching skills I'd have to observe her teach. There is no substitute. There is no multiple choice test.

    Do you want to make this about your friend or multiple choice test's?

    ReplyDelete
  10. I'm not asking you to judge her teaching skills. Rather, I presented the context in which she uses multiple choice tests. Rather than talking about their abstract use, I wanted to use a specific, authentic example. What she is doing is convenient. She is able to quick ascertain which students can consistently and correctly able to connect key science concepts to their definitions and meanings.

    How is what she doing not "good for kids"?

    ReplyDelete
  11. Joe, I stumbled on this by accident and became quite fascinated by how similar our views are on this. I'm not a teacher by profession, but I certainly appreciate all the detailed thoughts you gave on this. Personally I used to loathe multiple choice tests because it always seemed I had a more exact answer than the choices they gave in math. When it was asking questions on a passage I read for reading comprehension, I had issues finding answers that encompassed my understanding of it.

    I think at the end of the day, my biggest issue is that by forcing a child/student to adhere to a strict set of predetermined rules and regulations, it diminishes the child/students ability to think both logically and abstractly, it damages their confidence in trying new things, keeps them from learning how to effectively and efficiently defend their "status quo" and find the right answer to larger problems through intelligent discourse.

    Anyone can say that 2+2=4 absolutely and indefinitely, but if abstract's are applied 2+2 could equal a range of things. 2 ears + 2 eyes = part of 1 head. Personally I've found people that answer 2+2 with 22 to be pretty smart, lol.

    Anyway, thanks for a fantastic blog post.

    ReplyDelete
  12. Multiple choice TESTING can easily be a fool's game. I have taught postgraduates in East Asia who have graduated without doing anything except such tests, and whatever they had learned was of course a fragmented mess. However I am puzzled as to why educators so rarely stand this whole deal on its head. In my experience multiple choice TEACHING/LEARNING (not testing) can be adapted in a Socratic way with great benefit to develop insightful thinking. This presupposes discussion of the choices made. Personally, I have also found m/c LEARNING to be very useful in laying down the first faint memory tracks when learning a foreign language: maybe because it forces a kind of focus and pause while, say, flashcards can become a numbing blur. - Thor May, Australia

    ReplyDelete