Named, ranked and blamed
League tables that measure teachers individually are gaining popularity in the US, but their impact can be catastrophic. One such project resulted in a practitioner taking his own life after a poor rating
One autumn morning, Rigoberto Ruelas didn’t turn up for work. His colleagues at Miramonte Elementary School in Los Angeles were worried; in the 14 years he had taught at the school, he had seldom taken a day off.
Several days later, a police search-and-rescue team taking part in a training exercise in the Angeles National Forest spotted the 39-year-old’s abandoned vehicle. In a nearby ravine, they found his body lying 100 feet below a bridge. The Los Angeles county coroner later ruled that he had taken his own life.
Despite working in a tough, gang-ridden part of LA, Ruelas adored his job. He would tutor his pupils at weekends, and pushed them to aim high and go to college. Speaking after Rigoberto’s funeral, his brother Jose told journalists: “I want him to be remembered as a person that loved his career, he had a passion for his career. He loved the children and that’s why he taught.”
But according to the Los Angeles Times, he was one of the “least effective” maths teachers in the city. Days before Ruelas’ death in September 2010, the newspaper had entered uncharted territory. In the UK, school league tables have - for better or worse - become accepted as part of our education system. But the Los Angeles Times took things to a new level: it published individual ratings for each of the city’s 6,000 elementary school teachers. It used the value-added measure, comparing individual pupils’ progress in test scores to evaluate what effect their teachers had on their learning.
The project’s impact was seismic. In Ruelas’ case, his family said that he had become deeply depressed by his poor rating. After news of his suicide emerged, thousands of teachers turned up at the newspaper’s office in downtown LA, calling for readers to boycott the paper and demanding that the ratings be removed from its website. The banners on display were emblazoned with angry messages including, “We are more than a test score”, “Demoralising teachers hurts students” and “LA Times, how do you help our kids?”
But in spite of the outrage among the teaching community, value-added teacher ratings have not gone away. The scores can still be viewed on the Los Angeles Times website; Ruelas’ poor rating in maths - and his “average” effectiveness rating for teaching English - can still be viewed, just like those of the thousands of other teachers in the city.
And it’s not just the press that has shown an interest in the data. Los Angeles Unified School District (LAUSD) now calculates its own teacher scores to evaluate the performance of individuals, and similar approaches are in use in many other school districts, such as Chicago and Columbia.
US secretary of education Arne Duncan has also come out in support of the scores, arguing: “The truth can be hard to swallow, but it can only make us better and stronger and smarter.”
Teacher scores have even made their way to the eastern seaboard. In February 2012, The New York Times published performance data for 18,000 elementary school teachers in the city.
But is rating individual teachers a genuine means of improving education through accountability? Should an employee’s appraisal be kept private and used purely for professional development, or can putting evaluations in the public domain be a real force for educational improvement?
If LA teachers were unhappy about being publicly named and shamed by the Los Angeles Times, they made sure that the two journalists behind the project - Jason Felch and Jason Song - knew what it felt like. “They were burning me and Jason in effigies,” Felch explains. “There were personal attacks on us. Jason Song got more of it because his name is easier to rhyme than Felch.”
The project began in 2009, when stories of underperformance in LA schools prompted the reporters to start looking into how teachers were being held to account by the school district.
“What we realised,” Felch says, “was that there was absolutely no measure of performance. For decades in LAUSD, teachers have essentially been given drive-by evaluations - very quick visits from a principal sitting in the classroom, checking off. Nationally it was the same picture. Teachers all around the country were receiving no feedback on their performance.”
Progress v achievement
While value-added (and contextual value-added) scores may have fallen out of favour in the UK, they have started to become more popular in the US over the past three years, and the approach piqued Felch’s interest.
“Schools were being called failing schools only because they had poor children,” he says. “Value-added was an effort to correct that by bringing in socio-economics, and bringing in growth rather than achievement level.”
By looking at how much progress a pupil makes over a set period of time rather than raw attainment, the theory goes, schools with low socio- economic catchment areas can be judged fairly alongside their neighbours in more affluent areas. The argument, staff at the Los Angeles Times soon realised, could be extended to teachers: they could mine the data to extract the impact of individual teachers in terms of how much value they added to pupils’ education.
After six months of haggling with the school district, the newspaper finally got hold of the figures it wanted by using freedom of information legislation. “No one had ever asked for the data before,” Felch says. “No one had even thought to ask. Even internally here, people were telling us, ‘You’ll never get that. They’ll never give you the data. Even if they do, you’ll never be able to analyse them.’”
But analyse the data they did. The Los Angeles Times hired Richard Buddin, an education policy expert at RAND Corporation, to do the number crunching, before checking his work with several other academics and its own in-house data experts.
But while critics of the project were quick to damn Felch and his colleagues as journalists out to take a cheap shot at the teaching profession, he insists that their motives were genuine.
“In the United States, our whole education system is a self-fulfilling social prophecy,” he explains. “Because of our accountability structure with testing, poor kids do poorly, rich kids do great on tests. That makes us think that the schools that these rich kids go to are great schools.
“[The system] is built on this ridiculous fallacy. Yet parents, teachers, the state, resources, all of it is geared towards this fallacy. We saw value-added (scores) as a way to cut through the socio-economics that are skewing the whole picture, and really shine a light among students, teachers and schools that’s not just a reflection of socio-economics.
“[This is done] by comparing students with their own prior behaviour. So if a student comes from an inner-city family - dad’s not in the picture, mom’s on drugs - the assumption of value-added is that that (scenario) is a relative constant in this kid’s life.”
Felch argues that by looking purely at the relative progress made by pupils in successive exams, it is possible to strip out extraneous social factors, meaning that pupils - and teachers - can be compared on a like- with-like basis.
Using this measure of pupils’ relative progress, the Los Angeles Times rated the city’s elementary teachers on their effectiveness in teaching English and maths in terms of how much value they added - ie, whether pupils progressed more rapidly than would have been expected, based on their prior performance. Each teacher was classed as “least effective”, “less effective”, “average”, “more effective” or “most effective”.
As well as posting details about the calculations on its website, the Los Angeles Times also gave teachers the chance to raise their concerns.
“A lot of teachers felt maligned by the data,” Felch says. “One of the things I’m proud of is that we took those complaints seriously and allowed teachers to point out mistakes in the data and things that were unfair.”
He estimates that about 80 per cent of complaints were from teachers who simply thought they deserved a better rating; the remaining 20 per cent were legitimate grievances about errors in the data. As a result, some teachers’ scores were removed.
Teachers hit back
The comments that teachers posted next to their ratings offer an insight into the massive impact the project had on teachers’ lives. Some teachers offer reasons for their low scores, such as retirement, maternity leave or the fact that they didn’t actually teach the classes concerned. Others take the opportunity to express their pain and anger.
Angelica Barraza, a third-grade teacher at Hooper Avenue Elementary, writes: “I’ve seen the disheartening effect of your scoring system on excellent teachers that I have had the privilege of working alongside… One teacher in particular comes to mind. He’s the type of teacher who is first in and works through recess and lunch. A good teacher who was made to feel that his efforts as an educator were meaningless based only on test scores. ‘What more can I do?’ he asked as he reviewed the ratings himself, trying to figure out what led to his poor showing.”
Winnetka Avenue Elementary teacher Lilia Alzate - classed as “least effective” in English and maths - admits that she has been seriously affected. “Your publishing (of) these test scores (has) kept teachers awake at night, including myself. Could it also be that some who have suffered a degree of emotional instability may not have survived your ratings?”
Although Stephanie Logan, who teaches at Seventy-Fifth Street Elementary School, is classed as a “more effective” maths teacher, she is described as one of LA’s “least effective” English teachers. “I feel like I’m being punished for being responsible and not saying ‘no’ when I was asked to take (difficult) students,” she writes. “I feel hurt and humiliated to be rated like this. Should I have refused to take those students in?”
Her colleague James Melin, classed as a “less effective” maths teacher, puts his point across more forcefully. “Listen,” he writes. “I teach in an area of south Los Angeles that most of your readers wouldn’t want to drive through. I work at a job that most of your readers wouldn’t dare undertake because I am so underpaid for what I do. I work for a district that has seen it fit to lay me off the past three years, only to rehire me at the very last second.
“Nobody who matters give a hoot about your rating…I will be receiving ‘thank you’ notes from many of my students when they are in college or are productive adults in society. At that time, my effectiveness as a teacher can be measured.”
Felch acknowledges that the ratings aren’t 100 per cent accurate. “Our confidence in these figures varies,” he says, “and these are not exact figures, they are estimates. That’s the best you can do with this. The data’s strongest at the two extremes. It’s a big bell curve, in the middle it gets squishy.”
When I ask Felch about the impact on teachers, he is unrepentant. “The kneejerk reaction is that this is evil and wrong, and is going to perpetuate all the inequalities,” he explains. “When you understand the goals of value-added, I would think teachers would be excited. For the first time in their careers, they have an opportunity to succeed even if they teach poor kids. Here’s a system that will level the playing field, and try to take out of the equation all the socio-economic stuff they feel that they are blamed for by society.”
But it is not just teachers who have expressed reservations about the scores. Two years ago, Derek Briggs and Ben Domingue of the National Education Policy Center analysed the Los Angeles Times’ ratings. They concluded that the newspaper’s research “was demonstrably inadequate to support the published rankings”. In its next set of teacher scores, the newspaper altered its methodology.
Speaking at the Education International (the global federation of teacher unions) conference in London in January, Lily Eskelsen, vice-president of the National Education Association, added her voice to the debate.
“I wouldn’t mind the ranking so much if it was just used on the sports page where it belongs,” she quipped, adding: “We’re making decisions around bad data… (The Los Angeles Times reporters) put a small disclaimer on their work saying that yes indeed, they know that what they’re about to tell you is not accurate, and then they use that disclaimer as permission to proceed with giving you bad information.”
Schools join in
But it is not just the newspaper that is now making use of the data. Although LAUSD was initially reluctant to put the data in the public domain, it has - perhaps surprisingly - now decided to put together its own value-added scores.
After first publishing value-added data - dubbed Academic Growth over Time (AGT) - at a school level, it started a pilot scheme on individual teachers. Crucially, the teacher-level ratings are not made public, but many - no doubt scarred by their experiences with the Los Angeles Times - are still less than impressed.
According to the AGT system, Brent Smiley, who teaches social sciences at Lawrence Middle School in Chatsworth, near LA, freely admits that he is “one of the ‘least effective’ teachers in the district”.
When I ask him why, he pauses for dramatic effect. “No matter what I do,” he finally answers, “I can’t get 103 per cent of my kids over the bar.” He bursts out laughing.
Smiley’s problem, he explains, is that the pupils at his school are too good. “The kids I teach are gifted and highly gifted, the school’s a magnet for them. And so last year I had 97.7 per cent of my students reach advanced or proficient. I was only able to go up about 1.5 per cent (from the year before).”
Compared with the set goal of a 6 percentage point increase, Smiley had - through no fault of his own - fallen short.
The relative nature of the accountability system has created perverse incentives for teachers, Smiley explains. “I would be best served personally to have my students tank the testing every other year. That would mean that one year I’d be the ‘most effective’, the next year I’d be the ‘least effective’, and I’d get ping-pong balled. That is not healthy for anyone, that’s not what I’m doing. I don’t care a damn about my test scores (but) I owe it to the kids to get them to be as proficient as possible.”
So how does Smiley play the system to achieve such good scores for his pupils? “I’ve figured out how to beat the test,” he admits. “It’s just a vocab test they take through social studies, that’s all it is. It’s a piece of cake. We spent five minutes a week on it, and we hit 97.7 per cent.”
But the irony for Smiley is that, having learned how to game the system, he can teach the way he wants to. “By figuring out how to beat their test, it freed me up to go about teaching the right way. But not everyone has the luxury of kids who are at the upper echelon.”
By trying to create a new accountability system that aims to ensure that teachers are teaching well, some teachers are paradoxically having to focus on getting a good score, rather than on providing a good, rounded education for their pupils.
“What value-added models are doing,” Smiley argues, “is trying to give a very simple answer to one of the most complex questions that there is. What they are really trying to do is define teaching as a science. It’s not, it’s an art.”
Attempting to offer a genuinely objective assessment is no mean feat. Professor Alan Smithers, director of the Centre for Education and Employment Research at the University of Buckingham in the UK, says that he is “uneasy” about teachers being evaluated publicly, not least because of the limitations of the data.
“If you are looking at pupils’ test results,” he says, “they depend on the pupils’ abilities, motivations and aspirations to study. Whether or not a child learns is ultimately down to them. (The pupils’ attainment) reflects a whole range of teachers they have had before, not just who they had at a particular age.
“I don’t think it will lead to good teaching. This approach will encourage teachers to develop a box-ticking mentality - teachers will play it safe. This approach would be absolutely terrifying for them, even if it were totally accurate. If you use the data in that way, it will have a massive impact on staff.”
On this issue, the LAUSD agrees. Despite further requests from the Los Angeles Times for the data that would allow it to update its teacher ratings once more, the district has steadfastly refused to release any details that would allow teachers to be identified by name. In December, the Los Angeles Times submitted a lawsuit to try to force the district to comply. The case has not yet been decided.
What has been decided, however, is a new approach to teacher evaluation in LA. Controversial moves to use teachers’ individual AGT scores in their formal evaluations have been watered down. In January, the LA teachers’ union, United Teachers Los Angeles, voted to go along with an agreement to base teacher evaluations on three factors: combination of raw test data, school performance and “robust classroom observation”. Although AGT scores won’t be directly used in evaluations, they can be referred to to provide “context” to a teacher’s performance.
This serves to illustrate the limitations of relying solely on test data. As the Los Angeles Times has already admitted, its scores “do not capture everything about a (teacher’s) performance”.
But the most poignant reminder of the limitations of teacher ratings comes from their most well-known victim, Rigoberto Ruelas. On the day of his funeral, LAUSD revealed that, in his final evaluation, he had scored the highest grade possible.
But more than two years after his tragic death, Ruelas’ name still appears on the Los Angeles Times’ website. Irrespective of the views of his colleagues and pupils, he remains one of Los Angeles’ “least effective” maths teachers.
Photo credit: AP