Friday, November 30, 2012

Ravitch: Should Computers Grade Essays?

This was written by Diane Ravitch who is an advocate for progressive education reform who works tirelessly to call out education deformers and their corporate-style reforms that work to undermine teachers and public education. Diane blogs here and tweets here. This post was originally posted here.

by Diane Ravitch

Todd Farley is the scourge of standardized testing. His book, “Making the Grades,” is a shocking exposé of the industry. Todd spent nearly 15 years scoring tests, and he knows the tricks of the trade.

In this article, he skewers the latest testing craze: machine-scoring of essays.

Having demonstrated the fallibility of humans who score essays, Farley is no more impressed by computer scoring. As he puts it:

“…the study’s major finding states only that “the results demonstrated that overall, automated essay scoring was capable of producing scores similar to human scores for extended-response writing items.” A paragraph on p. 21 reiterates the same thing: “By and large, the scoring engines did a good [job] of replicating the mean scores for all of the data sets.” In other words, all this hoopla about a study Tom Vander Ark calls “groundbreaking” is based on a final conclusion saying only that automated essay scoring engines are able to spew out a number that “by and large” might be “similar” to what a bored, over-worked, under-paid, possibly-underqualified, temporarily-employed human scorer skimming through an essay every two minutes might also spew out. I ask you, has there ever been a lower bar?”

Farley quotes the promoters of automated scoring, who say that the machines are faster, cheaper and more consistent than humans. Also, they make money.

He concludes: “Maybe a technology that purports to be able to assess a piece of writing without having so much as the teensiest inkling as to what has been said is good enough for your country, your city, your school, or your child. I’ll tell you what though: Ain’t good enough for mine.”

One of the responses to Farley’s post came from Tom Vander Ark, who is a tech entrepreneur and a target of Farley’s post.

Vander Ark wrote: “The purpose of the study was to demonstrate that online essay scoring was as accurate as expert human graders and that proved to be the case across a diverse set of performance tasks. The reason that was important is that without online scoring, states would rely solely on inexpensive multiple choice tests. It is silly to suggest that scoring engines need to ‘understand,’ they just need to score at least as well as a trained expert grader and our study did just that.”

A reader of this blog saw this exchange on Huffington Post and sent me this comment:

“Diane–we use an automated essay scorer at my school, and I have seen coherent, well-thought out writing receive scores below proficient, while incoherent, illogical writing (with more and longer words, and a few other tricks that automated scorers like) receive high scores. The students who suffer the most are the highest level students, the verbally gifted writers who write with the goal of actually being understood, “silly” as that may be.”

“In fact, all standardized testing penalizes the brightest students–those who think outside the box. Standardized testing is the box.”


  1. I think I prefer a human set of eyes on student writing assignments.

  2. Asking questions are genuinely good thing if you are not understanding
    something entirely, except this article presents good understanding yet.
    My website - Http://Www.Youtube.Com/Watch?V=4PsnmxazWrk

  3. If we see technology, in any form. as a panacea for all of our problems. the risk of using them mindlessly increases. Technology, from its Greek roots, suggests the artful and skillful use of a tool. The conversation should be about is this the right tool for this task, this context, and these people? This is like the 0 mark debate. It misses the point unless we have a different conversation focused on the idea of "Are we educating children and adults for a world that has not even been invented or perceived of yet?" I doubt it.