Showing posts with label Andrea Sands. Show all posts
Showing posts with label Andrea Sands. Show all posts

Wednesday, May 28, 2014

Essay-marking software gives high marks for gibberish, U.S. expert warns Alberta

This was written by Andrea Sands who is a journalist with the Edmonton Journal. Sands tweets here. This post was originally found here.

by Andrea Sands

Alberta Education should not allow high-stakes Grade 12 diploma exams to be marked by computer because today’s essay-marking software is terribly flawed, says a retired Massachusetts Institute of Technology (MIT) professor and writing-assessment expert.

Les Perelman has been an outspoken critic of automated essay marking and worked with MIT students to invent computer software called BABEL (Basic Automatic B.S. Essay Language Generator), designed to trick today’s essay-scoring programs. BABEL generates gibberish essays which, nonetheless, get high marks from so-called “robo graders.”

“I realized very early on that machine grading of writing really is, essentially, impossible,” Perelman said from Massachusetts.

“I can guarantee you, smart 18-year-olds will try to trick it. If they find out it favours certain things like word length, obscure words and pretentious language, then that is exactly what is going to be probably taught in schools.”

Last fall, Alberta Education paid U.S. company LightSide $5,000 to do a feasibility assessment on whether computers could accurately mark Grade 12 diploma-exam essay questions. Every year, about 190,000 Alberta students write the exams, worth 50 per cent of a student’s final mark.

The LightSide report concluded its software could reliably grade student essays and, in the test analysis, did so more accurately than teachers.

Perelman said the report is “fudging” because of the way data was analyzed and because there’s not enough information and explanation in the report. “I’m a pro at this and there were tables I could not understand.”

Computers can’t analyze meaning and essay-grading software usually inflates grades for essays that are longer and use uncommon words — “egregious” instead of “bad,” or “plethora” instead of many, Perelman said.

“They’re counting word length, or they’re counting types of words, or they’re counting sentence length, or they’re counting connective words like ‘consequently,’ ‘moreover,’ things like that, that show cohesion. But all those things are very mechanical.”

About nine big companies sell essay-marking programs, including Pearson, ETS, CTB/McGraw-Hill, Measurement Inc. and LightSide, which is a fairly new company.

They are lining up to win “the big prize,” said Perelman — contracts to machine-score the increasing number of tests that will result from the new Common Core State Standards Initiative. Almost all U.S. states are now implementing new standards for kindergarten through Grade 12 in English language arts/literacy and mathematics that will require annual testing.

According to LightSide’s website, the Common Core “will require more writing in every classroom, meaning more time grading and even less time for teachers to work directly with students.”

Perelman does, however, credit LightSide as being one of the more reputable companies. Company founder Elijah Mayfield, who led the Alberta study, has responded to Perelman’s criticisms by trying to improve LightSide’s products and acknowledges the limitations of computerized grading, Perelman said.

“He’s an engineer. Most of the other companies are salesmen. I still think he’s wrong.”

Mayfield, who headed up the Alberta feasibility assessment, said a confidentiality agreement prohibits him from speaking to the Journal about the report.

The U.S. National Council of Teachers of English argues the Common Core initiative is pushing companies, testing agencies and education organizations to use automated essay grading because it’s cheaper than paying people to mark tests.


Alberta Teachers’ Association president Mark Ramsankar said it’s unfair to have students spend hours writing an exam that’s marked by a machine.

“How does a machine look at the symbolism contained in a piece of writing and interpret that as symbolism?” Ramsankar said. “I’d like to see how Shakespeare stacks up in a computer-generated mark.”

Both the ATA and the Canadian Teachers’ Federation oppose machine marking of essays, particularly on diploma examinations that are the culmination of a year’s worth of students’ learning.

“Again, we’re seeing practices that we’re looking at importing from the United States,” said federation president Dianne Woloschuk. “Their education system is in a crisis. Their students are not doing well.”

Alberta Education continues to evaluate whether machine-scoring of essays could be useful here. However, Premier Dave Hancock and Education Minister Jeff Johnson have said there are no plans to pursue it at this time.

asands@edmontonjournal.com

Twitter.com/Ansands

‘Mankind will always conduct prejudice’


English 30-1 students in Alberta study the novel Pride and Prejudice, by Jane Austen, so retired MIT professor Les Perelman entered the key words “pride, prejudice” and “father” into his essay generator, BABEL. The program creates mechanically correct but “completely incoherent” essays that have fooled automated-marking programs.

Perelman’s sample was declared “off-topic,” but scored 88 per cent when Perelman ran it through the home-schooling version of IntelliMetric, under the software’s category “challenges of parenthood.”

IntelliMetric technology is used in the United States to score the Graduate Management Admission Test (GMAT), which graduate students take to get into management programs in business schools.

Here is the text from the BABEL-generated essay:

Keywords:

pride: [‘pridefulness’, u’pride’]

prejudice: [‘preconception’, ‘bias’, u’prejudice’]

father: [‘begetter’, u’father’, ‘male parent’]

Essay:

Pridefulness with decency has not, and in all likelihood never will be malevolent, humane, and considerate. Mankind will always conduct prejudice; many for an advance but a few on pulchritude. a quantity of pride lies in the study of reality as well as the area of semantics. Why is pride so efficacious to depreciation? The reply to this query is that male parent is rivetingly and gregariously febrile.

Rationale, usually by appreciation, might feign prejudice. If nearly all of the appendages adjure an explanation of the erratically or idolatrously pagan disparagement, the haphazard preconception can be more falteringly sublimated. Additionally, an orbital is not the only thing simulation reacts; it also spins at male parent. Our personal altruist on the exposure we arrange can surprisingly be an interloper. Be that as it may, knowing that executioner can be the assassination, most of the comments to my accusation civilize irrelevant scrutinizations. In my philosophy class, all of the advancements by our personal allusion of the demarcation we decry accede amplifications which deliberate with analyses but masticate veracity that should inconsistently be a contradiction and occlude circumspections for expositions. Begetter which is mimicking in how much we cavort ousts whiner of our personal inquiry to the apprentice we propagate as well. an accession will enthusiastically be a concurrence on the insinuation, not an intercession. In my experience, none of the reprimands by our personal advocate at the sophist we substantiate contemplate postulation that blusters but append. a abundance of father changes culmination for bias.


As I have learned in my literature class, humanity will always depreciate father. Even though the brain counteracts a gamma ray to contentment, the same pendulum may catalyze two different neutrinoes with the remarkably accumulated culpability. Although the same neuron may receive two different brains, radiation processes orbitals of disenfranchisements on a taunt. The plasma is not the only thing a gamma ray oscillates; it also transmits neutrinoes for abandonment at the trope by father. The diagnosis of begetter changes a plethora of preconception. The less eventual allocutions pledge thermostats, the more an organism inaugurates those in question.

Malcontent, normally on the assumption, demolishes father. As a result of scintillating, all of the adjurations hobble equally with prejudice. Also, male parent to speculations will always be an experience of humankind. In my theory of knowledge class, some of the juggernauts of my scenario assimilate probes by the search for semiotics. Still yet, armed with the knowledge that propaganda can be a demonstration or homogenizes, many of the lamentations for my dictum abandon periodicity and voyage. In my philosophy class, almost all of the domains at our personal denouncement by the comment we admonish explain demolishers which declare the demolisher with the quip on gluttony that enthrals speculations or disparage agriculturalists. Pride which utters substantiation may boastfully be propinquity or is avowed but not impartial of my advancement also. a fetishistic fulmination belittles the people involved, not assemblage. Our personal conveyance to the reprobate we implore should be the analysis. The tendentiously vast prejudice changes a quantity of pridefulness.

Bias has not, and undoubtedly never will be reticent yet somehow gluttonous. However, armed with the knowledge that a report with assemblies accounts, all of the tyroes for my amplification shriek. By the fact that gratuitous dictators are articulated at pride, most of the amplifications confide too by pride. Prejudice will always be a part of human society. Pridefulness is the most feckless proclamation of human life.

Monday, May 26, 2014

Computers 'dramatically more reliable' than teachers in marking Alberta diploma-exam essays: study

This was written by Andrea Sands who is a journalist with the Edmonton Journal. Sands tweets here. This post was originally found here.

by Andrea Sands

Phil McRae with the ATA.
Photo by John Lucas, Edmonton Journal
A computer could do a better job than a teacher in marking Grade 12 diploma exam essays, a government-commissioned study says.

Last fall, Alberta Education sent two 2013 diploma-exam questions along with nearly 1,900 student essay answers that had been graded by teachers to LightSide, a Pennsylvania company that develops computer software to score student essays.

LightSide’s automated algorithms outperformed human reliability in the Alberta study by about 20 per cent, said the company’s January 2014 report to the government.

“We are certain that LightSide is able to reproduce scoring behaviour at least as reliably as human graders, and in many cases we believe that our automated performance would be dramatically more reliable than human grading,” the report said.

The study indicated Alberta Education’s human scoring was quite unreliable, below the threshold LightSide recommends for high-stakes testing.

Alberta should consider investing in “a more stringent training process for human graders,” the report said. “It is somewhat alarming to see human reliability so low.”

The $5,000 study suggests LightSide’s marking program is more reliable than a single human marker, but Alberta uses a double-marking system, said Neil Fenske, Alberta Education’s executive director for assessment.

At least two teachers grade each diploma-exam essay and, if the grades differ, it goes to a third marker.

“So we’ve built a system that is highly reliable ... but it’s very labour-intensive as well,” Fenske said.

Alberta Education commissioned two previous studies — one three years ago and one 15 years ago — to see if computerized essay marking could work, Fenske said.

Further study is needed because LightSide examined a very small sample, Fenske said.

However, the report does show automated technology has evolved enough that it could be useful, if Alberta combined the marking power of people with the speed and reliability of machines, he said. That could mean one person marks an essay, then it’s run through a computer for grading, and sent to another person if there’s a discrepancy.

The province will also soon need diploma exams marked more often than before. Last year, the department announced Grade 12 diploma exams will be offered more often and digitally, part of efforts under Inspiring Education to make the school system more flexible.

“Because marking means teachers out of the classroom, one of the things we have to take a look at is, is there a way that we can build a better marking system that’s better for students but maybe also keeps more teachers in the classrooms?” Fenske said.

Alberta Education has had trouble this year recruiting enough teachers to grade diploma exams, which are worth 50 per cent of a student’s final mark.

Around the same time the LightSide report was commissioned, Education Minister Jeff Johnson cut a grading honorarium in half — from $200 to $100 — for teachers who volunteer to mark diploma exams on a regular workday. Fewer teachers volunteered to do the marking this year, and it’s taking longer as a result.

Johnson and Premier Dave Hancock said this week the honorarium cut should be re-examined.

Asked about the LightSide report at a Journal editorial board meeting this week, Hancock said he hasn’t yet read the study, but is not interested in having machines grade diploma-exam essays.

“I think that’s an absolute disastrous way to go,” Hancock said. “There are things that teachers bring to the process that are very important.”

Teachers also benefit from professional development when they mark the exams, meeting with colleagues from across the province and discussing education standards, Hancock said.

“At this time, there are no plans to institute digital scoring systems in provincially graded essays like diploma exams,” Johnson said in a statement.

Last weekend, Alberta Teachers’ Association delegates voted unanimously at their annual meeting in Calgary to opposed machine scoring of essay questions.

The ATA was never told about the LightSide study, but machine-marking has sparked heated debate in the United States, said Phil McRae, ATA executive staff officer and adjunct education professor with the U of A.

Standardized testing in the U.S. is growing, prompting governments and school districts to look to computer scoring to keep grading costs down, said McRae, who researches technology in education.

“It’s about reducing costs, whether it’s development of the items for the tests, administration or scoring.”