--------------------------------------------------------------------------------
September 28, 2009
Op-Ed Contributor
Reading Incomprehension
By TODD FARLEY
LAST week, Education Secretary Arne Duncan acknowledged standardized tests are flawed measures of student progress. But the problem is not so much the tests themselves — it’s the people scoring them.
Many people remember those tests as lots of multiple-choice questions answered by marking bubbles with a No. 2 pencil, but today’s exams nearly always include the sort of “open ended” items where students fill up the blank pages of a test booklet with their own thoughts and words. On many tests today, a good number of points come from such open-ended items, and that’s where the real trouble begins.
Multiple-choice items are scored by machines, but open-ended items are scored by subjective humans who are prone to errors. I know because I was one of them. In 1994, I was a graduate student looking for part-time work. After a five-minute interview I got the job of scoring fourth-grade, state-wide reading comprehension tests. The for-profit testing company that hired me paid almost $8 an hour, not bad money for me at the time.
One of the tests I scored had students read a passage about bicycle safety. They were then instructed to draw a poster that illustrated a rule that was indicated in the text. We would award one point for a poster that included a correct rule and zero for a drawing that did not.
The first poster I saw was a drawing of a young cyclist, a helmet tightly attached to his head, flying his bike over a canal filled with flaming oil, his two arms waving wildly in the air. I stared at the response for minutes. Was this a picture of a helmet-wearing child who understood the basic rules of bike safety? Or was it meant to portray a youngster killing himself on two wheels?
I was not the only one who was confused. Soon several of my fellow scorers — pretty much people off the street, like me — were debating my poster, some positing that it clearly showed an understanding of bike safety while others argued that it most certainly did not. I realized then — an epiphany confirmed over a decade and a half of experience in the testing industry — that the score any student would earn mostly depended on which temporary employee viewed his response.
A few years later, still a part-time worker, I had a similar experience. For one project our huge group spent weeks scoring ninth-grade movie reviews, each of us reading approximately 30 essays an hour (yes, one every two minutes), for eight hours a day, five days a week. At one point the woman beside me asked my opinion about the essay she was reading, a review of the X-rated movie “Debbie Does Dallas.” The woman thought it deserved a 3 (on a 6-point scale), but she settled on that only after weighing the student’s strong writing skills against the “inappropriate” subject matter. I argued the essay should be given a 6, as the comprehensive analysis of the movie was artfully written and also made me laugh my head off.
All of the 100 or so scorers in the room soon became embroiled in the debate. Eventually we came to the “consensus” that the essay deserved a 6 (“genius”), or 4 (well-written but “naughty”), or a zero (“filth”). The essay was ultimately given a zero.
This kind of arbitrary decision is the rule, not the exception. The years I spent assessing open-ended questions convinced me that large-scale assessment was mostly a mad scramble to score tests, meet deadlines and rake in cash.
The cash, though, wasn’t bad. It was largely for this reason that I eventually became a project director for a private testing company. The scoring standards were still bleak. A couple of years ago I supervised a statewide reading assessment test. My colleague and I were relaxing at a pool because we believed we’d already finished scoring all of the tens of thousands of student responses. Then a call from the home office informed us that a couple of dozen unscored tests had been discovered.
Because our company’s deadline for returning the tests was that day, my colleague and I had to score them even though we were already well into happy hour. We spent the evening listening to a squeaky-voiced secretary read student answers to us over a scratchy speakerphone line, while we made decisions that could affect somebody’s future.
These are the kinds of tests, after all, that can help determine government financing for schools. There is already much debate over whether the progress that Secretary Duncan hopes to measure can be determined by standardized testing at all. But in the meantime, we can give more thought to who scores these tests. We could start by requiring that scoring be done only by professionals who have made a commitment to education — rather than by people like me.
Todd Farley is the author of the forthcoming “Making the Grades: My Misadventures in the Standardized Testing Industry.”
As I am getting ready to apply to universities, I was shocked after reading this op-ed. I always thought that short response and essay questions weren’t that accurate, but I never thought they would be so of. After thinking about it I can believe it, I mean we are all different and as the article says “the problem is not so much the tests themselves — it’s the people scoring them.” After all standardized multiple choice questions, are scored by a machine and there is a right or wrong answer, writing is according to the person who reads your paper. This statement says it all, “but open-ended items are scored by subjective humans who are prone to errors.”
It is their opinion and that is the problem. In the article they cite a perfect example to show the ineffectiveness of this. An essay is like a piece of art, it may appeal many and many may dislike it. So really the grade is subjective to the graders point of view and interests. In other words, “that the score any student would earn mostly depended on which temporary employee viewed his response.”
Here is a specific example that happened, “At one point the woman beside me asked my opinion about the essay she was reading, a review of the X-rated movie “Debbie Does Dallas.” The woman thought it deserved a 3 (on a 6-point scale), but she settled on that only after weighing the student’s strong writing skills against the “inappropriate” subject matter. I argued the essay should be given a 6, as the comprehensive analysis of the movie was artfully written and also made me laugh my head off.” The essay was finally given a zero. There were opinions giving the essay each a different grade, so the question was which grade was the one that essay deserved?
I believe something must be done, either more specific benchmarks should be made so everyone is graded equally or simply a replacement of this written parts. For, they are decisions that could affect someone future. This is a serious issue which should be handled sincerely. The test takers deserve to be taken into account and with this measure, we can infer they are not taken seriously. As the op-ed ends with a strong statement many of us don’t know, “We could start by requiring that scoring be done only by professionals who have made a commitment to education — rather than by people like me.” I could have sworn professionals checked these tests, apparently it could be any one, so what criteria will be used to judge your writing is a mystery.
jueves, 8 de octubre de 2009
Suscribirse a:
Enviar comentarios (Atom)
No hay comentarios:
Publicar un comentario