"Anyone can confirm how little the grading that results from examinations corresponds to the final useful work of people in real life."
-Jean Piaget
It's final exam time at my school, and my teacher colleagues are collectively herding to the multiple-choice test scoring machine. For just under $800 CAD, our scoring machine can:
- Scan up to 35 sheets per minute
- Grade up to 100 questions per pass
- Score exams with up to 200 questions
- PC compatibility for advanced data collection and analysis
The front of the instruction manual proudly reads “GRADING YOUR TESTS JUST GOT EASIER!” After I watched scoring sheet after scoring sheet scurried through the scoring machine, I can personally attest to how easy this really is. It's no secret why multiple choice exams are so popular among teachers – their utility is second to none. But, what are the cons to multiple choice tests? Here are a few items to think about before giving your next multiple choice test:
AmbiguityMisinterpreting a question can result in an "incorrect" response, even if the response is valid. A free response test allows the test taker to make an argument for their viewpoint and potentially receive credit. Depending on the number of possible answers that are provided, a test taker could have a chance of completely guessing the correct answer. It is conceivable for a student to select the wrong answer for the right reasons or to select the right answer for the wrong reasons. The results of such a multiple-choice exam are surrounded with uncertainty and doubt.
No partial creditEven if a student has some knowledge of a question, they receive no credit for knowing that information if they select the wrong answer. Free response questions may allow a test taker to demonstrate their understanding of the subject and receive partial credit.
Even carefully constructed exams that reflect very detailed curriculums can be used to improperly assess students. If an exam was created to carefully reflect a certain curriculum, you might see only one question that covers a specific outcome. What if that student did in fact understand that outcome but for any number of reasons, they get the question wrong? That means that this test would report that that student understood
nothing of that concept – which most likely would be wholly misleading and untrue. How often can a teacher honestly report that a student understands nothing?
Overemphasis on timeliness
A
premium is placed on speed at the cost of creativity and thoroughness. This overemphasis on timeliness also contributes greatly to the ambiguity of the exam. Most test-takers are taught to madly fill in the remaining answers before having their exam taken away by the exam supervisors. There is no way to differentiate between these random guess responses and the responses that were carefully and thoughtfully selected. Recognizing guessing as a problem, some test creators enact a penalty such as deducting a mark for incorrect answers – the hope being that test takers will not guess and instead leave the question blank. This solution may stop the guessing, but it still does not address the ambiguity, as all those unanswered questions will simply show that the test taker got them all wrong – when in truth, the test taker may have had some level of understanding, but because they couldn't get finished in time, or they were too scared to guess, they receive no credit.
Subjectivity
How is the length of the exam decided? How many questions are necessary to show enough understanding? In the case of reading comprehension exams, how many reading selections will there be, and what is an appropriate length? How many answers will there be to select from? Which outcomes will be tested? Which will be excluded? Which will be more heavily weighted?

Depending on the date, the question "how many planets are there in our solar system?" has a different answer. What about all those students who were penalized for excluding Pluto as a planet before 2006?
The point here is not to try and figure out the answer to these questions; rather, there is no one answer for these questions. And yet, the choices made by the test taker can have an immeasurable effect on the test's results. One of my favorite quotes on the subjectivity of testsandgrades comes from Paul Dressel who said, "A mark or grade is an inadequate report of an inaccurate judgement by a biased and variable judge of the extent to which a student has attained an indefinite amount of material.
Behaviouristic in nature
These tests only care about whether the student got the right answer. They can't measure whether the student has a true understanding for the content. Even in a subject such as math that can be (mis)labelled as very black and white and right or wrong, it should very much matter how a student comes to answer the question 2+2=4. Did that student simply memorize his cue cards, or does he actually understand the addition process? A multiple choice test does not and cannot concern itself with understanding such valuable information.
Poor Testing can lead to Poor Teaching
Some teachers may use multiple choice exams voluntarily while others may find their use compulsory. Either way,
teachers may feel pressure to achieve high scores on these tests, and that kind of pressure can lead to poor teaching, such as the use of lecturing on the behalf of the teacher and
memorization on the behalf of the student. Take math for example, many teachers
may teach tricks or shortcuts such as: when dividing two fractions, simply flip the second fraction and multiply. A student could
mindlessly comply and perform quite well by choosing the correct multiple choice answer. In cases like this, a poor assessment tool has lead to a
poor teaching technique (one that relies on mindless compliance and memorization rather than true understanding); however, if we use the test scores as an indicator for learning, that teacher and student appear successful. Inferences made from multiple choice tests can be undermined leaving the
successful and superficial students indistinguishable.
Interrater Reliability
Multiple Choice exams are created with one right answer in mind for each question. This straightforward scoring system is used so that any two raters will always agree upon how well a student did. This need for agreement, also known as interrator reliability by statisticians, is gained at an alarming price; Authenticity is sacrificed for (perceived) reliability.
If we were compelled to identify
who truly benefits from this kind of artificial measurement, I sincerely doubt anyone could honestly say that this is for the kids. Ultimately, this is an example of the needs of the system trumping the needs of the learner. Alfie Kohn puts it this way:
"You know it's a bad assessment if it's multiple choice. Multiple choice tests can be clever but they can't be authentic. You can't learn what kids know and what they can do with what they know, if they can't generate a response - or at least explain a response. Or as one expert in psychometrics told me many years ago, "Alfie don't you get it, multiple choice tests are designed so lots of students who understand the material will be tricked into picking the wrong response". That's why teachers would never dream of giving a multiple choice test of their own design because the same thing applies there."
Testing Test-taking skills
Multiple choice exams require a certain amount of test taking skills, and some students have better test taking skills than others. Many teachers will actually teach students strategies for writing multiple choice exams. For example, some test takers understand that an answer that has the words "always‟ or "never‟ is usually NOT the correct answer, because rarely is something ever "always‟ or "never‟. This is considered a fairly good strategy, and students who are aware of it may have a better chance of doing well.

However, there are some test takers who have come to believe in poor strategies. For example, some students believe the pattern of responses matters and so they say to themselves, “This can't be another "b‟ answer as we have just had three in a row.” Or they believe in myths such as “when in doubt, pick C”. Granted, we can all probably agree this is a silly strategy, but what if students actually use it? The format of the exam has skewed the measurement of that student's learning.
Averaging Averages
Traditional practice encourages test raters to not only mark each question right or wrong, but to also tally up the number of correct responses and compare that to the total number of questions – of course, we know this to be the average or mean. However,
what does this number actually tell us?
Let's pretend there are three questions on the test for every outcome we taught. You could then look at the data and see how many of those three questions a specific student got right or wrong. Let's say for those three questions a student got 1 out 3 correct but for another three questions, that tested a different outcome, the student got 2 of 3 correct. Separately, he understood 33% of the first outcome and 66% of the second outcome. However when you average these averages, he gets 3/6 which comes to a mark of 50%.
What do these numbers mean anymore? Imagine how diluted the average has become when you have 50 to 100 questions that may be measuring the same number of different outcomes. And yet these grades' importance is elevated to grand heights. (Note that the problem of averaging averages is not exclusive to multiple choice exams)
Collaboration = Cheating
Ask any parent for a list of characteristics they wish their children to develop as they grow into adults and there is a very good chance that collaborative skills are somewhere on that list. When you think back to your schooling, how often were you permitted to collaborate with others during examination? If you did try to collaborate, we all know what that was called – cheating! And you got in trouble for it.
Unfortunately, there may some progressive classrooms out there, but it would be a very safe bet to make that most classrooms still have students sitting and writing their exams in isolation. Regardless of your job or profession, how often are you told to figure something out in total and complete isolation – no books, no help, no talking? In the real world, there simply aren't that many times you are expected to solve a problem or perform a task in complete and total isolation – and even if you were, it would be awfully archaic to refuse you the opportunity to reach out for the help you needed to get the task done.
When we say to children, "I want to see what you can do, not what your neighbor can do", this turns out to be code for "I want to see what you can do artificially deprived of the skills and help of the people and resources around you. Rather than seeing how much more you can accomplish in a well functioning team that's more authentic like real life." (Again note that the lack of collaboration during exams is not exclusive to multiple choice exams)
Thinkingcuffs
The very nature of multiple choice tests slaps students with a pair of thinkingcuffs. Who does the majority of the thinking on a multiple choice exam? Who asks all the questions? Who proposes all the answers? Thinking is messy. Learning is messy, but multiple choice tests conveniently remove the mess. All students are required to do is circle or fill in a dot. If we were truly interested in assessing student learning, shouldn't we encourage the students to show us as much of their thinking as possible? Because no one can construct meaning in a preconceived bubble, reducing something as beautiful as learning to a bubble sheet is an exercise in needless oversimplification.
Differentiated Instruction and undifferentiated Assessment
Many teachers today would readily admit that all learners learn differently, and it is the teachers responsibility to address these different learning styles with differentiated instruction; however, many teachers still use multiple choice tests in an attempt to measure their student's learning. There is a real disconnect between our understanding of differentiated instruction and our attempts to measure learning with our
undifferentiated, standardized assessment tools.
While it is true that all children should have the opportunity to get an education that
does not mean that all children should get the same education. When it comes to instruction
and assessment, we need to stop trying to meet the needs of all learners by pretending all learners have the same needs.
Value what we Measure or Measure what we Value
It is true that it makes good sense to
occasionally stop and reflect upon how well we are learning – the rest of the time we should concern ourselves with actually learning whatever it is we have set out to learn.
A short anecdote may enlighten this point: A man was seen on his hands and knees searching underneath a street light. It was late at night and very dark. When a passerby inquired what the man was doing, the man said that he was looking for his lost keys. The passerby then noted that the man was fortunate that he had lost his keys under the street light. The man quickly replied that he actually lost his keys a distance to the north, but it was too dark over there, and so he wanted to search where it was easy to see.
There is a big difference between measuring what is
simply easily measurable and measuring what we actually consider important. Multiple choice tests measure a very limited and narrow kind of learning. If a great amount of importance is placed on these kinds of tests, people will come to see these limited and narrow kinds of learning as most important –
sacrificing their pursuit of other valuable kinds of learning that are rarely measured on multiple choice exams.
While a lot of people concern themselves with what will be on the test, I find myself thinking more about about what can
never be on these kinds of tests. Show me the multiple choice test that can assess things like sense of humor, morality, creativity, ingenuity, motivation, empathy.
***
Too many education systems have confused measurement with assessment and forgotten that the latin root for assessment is assidere which translates into "to sit beside". Assessment isn't a spreadsheet -- it's a conversation.
Multiple choice tests were originally tools used by teachers, but today teachers are tools used by multiple choice tests. This shouldn't come as any surprise, especially if you are familiar with some of Marshal McLuhan's work who once said, "We shape our tools and thereafter our tools shape us."
Despite all these reasons for abandoning the use of multiple choice tests, their utility seems to trump their consequences. What's even more discouraging is that many teachers still choose to use multiple choice exams despite having a plethora of more authentic assessment alternatives such as performance assessments, portfolios, written response and personal, two way communications.
Teachers who continue to use multiple choice exams as their primary or default assessment tool are engaging in a kind of educational malpractice because they are reporting on their student's learning in a way that may range from being marginally inaccurate to wholly untruthful.
I asked Irmeli Halinen, head of curriculum in Finland, how often a teacher in Finland would use a multiple choice test as a way of assessing their students. Her answer said it all:
"Our teachers rarely if ever use multiple choice tests because they would rather have their students do something real."