Wednesday, May 28, 2014

Essay-marking software gives high marks for gibberish, U.S. expert warns Alberta

This was written by Andrea Sands who is a journalist with the Edmonton Journal. Sands tweets here. This post was originally found here.

by Andrea Sands

Alberta Education should not allow high-stakes Grade 12 diploma exams to be marked by computer because today’s essay-marking software is terribly flawed, says a retired Massachusetts Institute of Technology (MIT) professor and writing-assessment expert.

Les Perelman has been an outspoken critic of automated essay marking and worked with MIT students to invent computer software called BABEL (Basic Automatic B.S. Essay Language Generator), designed to trick today’s essay-scoring programs. BABEL generates gibberish essays which, nonetheless, get high marks from so-called “robo graders.”

“I realized very early on that machine grading of writing really is, essentially, impossible,” Perelman said from Massachusetts.

“I can guarantee you, smart 18-year-olds will try to trick it. If they find out it favours certain things like word length, obscure words and pretentious language, then that is exactly what is going to be probably taught in schools.”

Last fall, Alberta Education paid U.S. company LightSide $5,000 to do a feasibility assessment on whether computers could accurately mark Grade 12 diploma-exam essay questions. Every year, about 190,000 Alberta students write the exams, worth 50 per cent of a student’s final mark.

The LightSide report concluded its software could reliably grade student essays and, in the test analysis, did so more accurately than teachers.

Perelman said the report is “fudging” because of the way data was analyzed and because there’s not enough information and explanation in the report. “I’m a pro at this and there were tables I could not understand.”

Computers can’t analyze meaning and essay-grading software usually inflates grades for essays that are longer and use uncommon words — “egregious” instead of “bad,” or “plethora” instead of many, Perelman said.

“They’re counting word length, or they’re counting types of words, or they’re counting sentence length, or they’re counting connective words like ‘consequently,’ ‘moreover,’ things like that, that show cohesion. But all those things are very mechanical.”

About nine big companies sell essay-marking programs, including Pearson, ETS, CTB/McGraw-Hill, Measurement Inc. and LightSide, which is a fairly new company.

They are lining up to win “the big prize,” said Perelman — contracts to machine-score the increasing number of tests that will result from the new Common Core State Standards Initiative. Almost all U.S. states are now implementing new standards for kindergarten through Grade 12 in English language arts/literacy and mathematics that will require annual testing.

According to LightSide’s website, the Common Core “will require more writing in every classroom, meaning more time grading and even less time for teachers to work directly with students.”

Perelman does, however, credit LightSide as being one of the more reputable companies. Company founder Elijah Mayfield, who led the Alberta study, has responded to Perelman’s criticisms by trying to improve LightSide’s products and acknowledges the limitations of computerized grading, Perelman said.

“He’s an engineer. Most of the other companies are salesmen. I still think he’s wrong.”

Mayfield, who headed up the Alberta feasibility assessment, said a confidentiality agreement prohibits him from speaking to the Journal about the report.

The U.S. National Council of Teachers of English argues the Common Core initiative is pushing companies, testing agencies and education organizations to use automated essay grading because it’s cheaper than paying people to mark tests.

Alberta Teachers’ Association president Mark Ramsankar said it’s unfair to have students spend hours writing an exam that’s marked by a machine.

“How does a machine look at the symbolism contained in a piece of writing and interpret that as symbolism?” Ramsankar said. “I’d like to see how Shakespeare stacks up in a computer-generated mark.”

Both the ATA and the Canadian Teachers’ Federation oppose machine marking of essays, particularly on diploma examinations that are the culmination of a year’s worth of students’ learning.

“Again, we’re seeing practices that we’re looking at importing from the United States,” said federation president Dianne Woloschuk. “Their education system is in a crisis. Their students are not doing well.”

Alberta Education continues to evaluate whether machine-scoring of essays could be useful here. However, Premier Dave Hancock and Education Minister Jeff Johnson have said there are no plans to pursue it at this time.

‘Mankind will always conduct prejudice’

English 30-1 students in Alberta study the novel Pride and Prejudice, by Jane Austen, so retired MIT professor Les Perelman entered the key words “pride, prejudice” and “father” into his essay generator, BABEL. The program creates mechanically correct but “completely incoherent” essays that have fooled automated-marking programs.

Perelman’s sample was declared “off-topic,” but scored 88 per cent when Perelman ran it through the home-schooling version of IntelliMetric, under the software’s category “challenges of parenthood.”

IntelliMetric technology is used in the United States to score the Graduate Management Admission Test (GMAT), which graduate students take to get into management programs in business schools.

Here is the text from the BABEL-generated essay:


pride: [‘pridefulness’, u’pride’]

prejudice: [‘preconception’, ‘bias’, u’prejudice’]

father: [‘begetter’, u’father’, ‘male parent’]


Pridefulness with decency has not, and in all likelihood never will be malevolent, humane, and considerate. Mankind will always conduct prejudice; many for an advance but a few on pulchritude. a quantity of pride lies in the study of reality as well as the area of semantics. Why is pride so efficacious to depreciation? The reply to this query is that male parent is rivetingly and gregariously febrile.

Rationale, usually by appreciation, might feign prejudice. If nearly all of the appendages adjure an explanation of the erratically or idolatrously pagan disparagement, the haphazard preconception can be more falteringly sublimated. Additionally, an orbital is not the only thing simulation reacts; it also spins at male parent. Our personal altruist on the exposure we arrange can surprisingly be an interloper. Be that as it may, knowing that executioner can be the assassination, most of the comments to my accusation civilize irrelevant scrutinizations. In my philosophy class, all of the advancements by our personal allusion of the demarcation we decry accede amplifications which deliberate with analyses but masticate veracity that should inconsistently be a contradiction and occlude circumspections for expositions. Begetter which is mimicking in how much we cavort ousts whiner of our personal inquiry to the apprentice we propagate as well. an accession will enthusiastically be a concurrence on the insinuation, not an intercession. In my experience, none of the reprimands by our personal advocate at the sophist we substantiate contemplate postulation that blusters but append. a abundance of father changes culmination for bias.

As I have learned in my literature class, humanity will always depreciate father. Even though the brain counteracts a gamma ray to contentment, the same pendulum may catalyze two different neutrinoes with the remarkably accumulated culpability. Although the same neuron may receive two different brains, radiation processes orbitals of disenfranchisements on a taunt. The plasma is not the only thing a gamma ray oscillates; it also transmits neutrinoes for abandonment at the trope by father. The diagnosis of begetter changes a plethora of preconception. The less eventual allocutions pledge thermostats, the more an organism inaugurates those in question.

Malcontent, normally on the assumption, demolishes father. As a result of scintillating, all of the adjurations hobble equally with prejudice. Also, male parent to speculations will always be an experience of humankind. In my theory of knowledge class, some of the juggernauts of my scenario assimilate probes by the search for semiotics. Still yet, armed with the knowledge that propaganda can be a demonstration or homogenizes, many of the lamentations for my dictum abandon periodicity and voyage. In my philosophy class, almost all of the domains at our personal denouncement by the comment we admonish explain demolishers which declare the demolisher with the quip on gluttony that enthrals speculations or disparage agriculturalists. Pride which utters substantiation may boastfully be propinquity or is avowed but not impartial of my advancement also. a fetishistic fulmination belittles the people involved, not assemblage. Our personal conveyance to the reprobate we implore should be the analysis. The tendentiously vast prejudice changes a quantity of pridefulness.

Bias has not, and undoubtedly never will be reticent yet somehow gluttonous. However, armed with the knowledge that a report with assemblies accounts, all of the tyroes for my amplification shriek. By the fact that gratuitous dictators are articulated at pride, most of the amplifications confide too by pride. Prejudice will always be a part of human society. Pridefulness is the most feckless proclamation of human life.

1 comment:

  1. Grading an essay using a software makes no sense. Relying on something like that to judge a written piece that came from a deep and intellectual and genuine is a disservice to the writer. - Layce, one of the many experienced Australian writers.


Follow by Email