Home
News
Secondary
The assessment bias trap: what the TAGs taught us

Back

The assessment bias trap: what the TAGs taught us

What are the most common types of teacher bias? And how can we avoid it impacting judgements in assessments?

15th September 2021, 10:00am

Unconscious Bias Teacher Bias Assessment Exams Teacher Assessed Grades Tags

Every assessment that a school conducts is at risk of unconscious bias.

To say this isn’t a criticism of teachers. The very nature of unconscious bias means it happens without us being aware of it, as Elspeth Kirkman, senior director of health, education and communities at the Behavioural Insights Team (BIT), explains.

“You’re not a bad person for being biased. Bias is a fundamental feature of human psychology.”

Kirkman, who is also the co-author of Behavioural Insights and who previously taught behavioural science at Harvard before joining BIT, elaborates further: “Our ability to survive depends on our ability to make judgements,” she says. “We use simple rules to handle the reams of complex information thrown at us every day. For example, ‘pay attention to things that seem unusual’.”

That rule helps us anticipate danger or handle changes in our environment, but it might also mean we are more likely to notice, and even fear, people who don’t look or behave like us.

“This means we make different judgements in relation to some people. We also absorb stereotypes from a young age and witness discrimination. It is easy to see how unconscious bias grows stronger.”

Within an education framework, concerns around how unconscious bias may impact marking have no doubt always existed but perhaps more as a slightly abstract, theoretical issue.

But when it was announced that teacher-assessed grades (TAGs) would be used for the 2020-21 academic year, it put the issue in the full glare of the spotlight.

Gone were the tight controls of external assessment and, instead, teachers were called upon to create, administer, mark and then award grades, in full knowledge of who each student was.

Unsurprisingly, many expressed concerns about the risk of teacher bias. How can we expect impartiality without anonymity?

In fact, so significant were the concerns that the research chair at Ofqual, Paul Newton, raised the issue in a blog post with the matter-of-fact title, Bias in teacher assessment.

“All judgements that we make as humans are susceptible to biases of this sort, without us necessarily even being aware of them. This includes the judgements that teachers make when they assess students,” he wrote.

He acknowledged that, in any assessments, errors around grading were inevitable but that ones that are caused by bias towards students from teachers were especially troublesome.

“No errors are good and we do all that we can to eliminate them,” he wrote. “But some errors feel worse than others. Bias that systematically affects one group of students more than others...feels especially pernicious.”

As such, in order to advise teachers how to tackle the problem, Ofqual carried out a review of existing third-party research on bias in teacher assessment to outline where the problem may manifest most, with the following four areas most likely to affect outcomes:

Gender bias.
Ethnicity bias.
Disadvantage bias.
Special educational needs and disability (SEND) bias.

Of all these groups, Newton warned that the most prevalent type of bias in teacher assessment happened with disadvantaged students and SEND students.

After highlighting the risk that these biases could occur in teacher assessments, Ofqual then urged teachers to take steps to try to minimise the chance of biased assessments taking place.

This included telling teachers to make themselves aware of the bias, using blind marking and reading the Joint Council of Qualifications guidance on maintaining objectivity.

Was it enough? Sadly, it seems not.

When exam results were released in August, it was clear that the very groups the guidance said were at risk of bias were, indeed, the same groups who received lower TAG grades than previous exam years.

Specifically, in the Ofqual published report Summer 2021 student level equalities Analysis GCSE and A level, we can see:

GCSE students in England who are eligible for free school meals (FSM) have dropped further behind their peers by around one-tenth of a grade compared with 2019.
A-level students in England who are categorised as SEND candidates dropped slightly more than one-tenth of a grade compared to prior-attainment-matched non-SEND candidates.

So was this all down to teacher bias?

Ofqual doesn’t quite say that. In its report, it acknowledges that it could be that those groups most at risk of bias were also the ones most likely to be adversely affected by the pandemic and so, naturally, did not do as well in their assessments.

But Ofqual also notes that those groups were - as it warned - most likely to be negatively impacted by teacher assessment through bias. As such, Ofqual says it is “impossible to disentangle” the two issues and, in doing so, skilfully avoids blaming teacher bias directly.

It should be noted, here, that for the students affected, the outcomes are ones with very real and far-reaching consequences, as Tom McBride, director of evidence at the Early Intervention Foundation, notes.

“Getting good exams results matters for individuals, the economy and society,” he explains.

“Young people who don’t do well in their GCSEs are far less likely than their peers to find well paid and stable employment. This is particularly true for disadvantaged pupils, who have less social capital than their more affluent peers.”

Of course, many will say TAGs were just a one-off and, with the expected return to exams, the issue will go away. Yes, one year group may have been disadvantaged but the pandemic caused issues so great that a foolproof system was never going to be possible.

However, what TAGs have laid bare is that the issue of bias and assessment is very real, and needs to be acknowledged and tackled.

After all, underestimating a student’s ability has far-reaching consequences. That student might be convinced that a certain subject isn’t for them, or become disillusioned with school, and so set lower standards for themselves and never reach their true potential.

But, given Kirkman’s view that unconscious bias is a natural state, can we really overcome it?

She says we can - and the most important step is acknowledging that the problem exists. “While we can’t control how we process information, we can be aware of it and we can change how we act on it.”

Do this and the discussion can quickly move into the realm of the practical; of how schools can ensure unconscious bias does impact assessments - and in doing so, avoid generating a false understanding of a students’ learning level.

We asked several leaders for their suggestions about what could improve the effectiveness of assessment in school and remove the issue of unconscious bias leading to groups of students finding their work “under-marked” by their teachers.

Becky Allen is the director of the centre for education improvement science at UCL Institute of Education

Putting students into streams can create a particularly strong label. Once students are in a stream, teachers may be more likely to assume that all students in the same stream are similar to one another and not differentiate learning for students within the same stream. We found examples within our research of it being very difficult to move between streams because of the implications for timetables and school organisation. So once in a stream (or a set), students are likely to be stuck there. There is also no reason to expect that students will have an even profile of strengths. A student might achieve well in maths but struggle in English or geography. If they are put in a middle stream, then teachers might have lower expectations of them in maths than if they were in a high stream. This is one reason why we recommend setting over streaming (see the Dos and Don’ts of Attainment Grouping, for more information).

Our research, in line with the wider literature, suggests that students in lower sets feel that teachers’ expectations of their learning are lower, and that they are given less challenging work to do. Students in the lowest sets felt babied by their teachers and we found that the well-intentioned “nurturing” approach used by some teachers of low sets might actually hold students back from independent learning. Work by Judith Ireson and Sue Hallam in the 2000s showed that the set you are in can make a difference to the exam results that you achieve. Students with the same prior attainment got higher exam results if they were placed in higher sets for maths, English and science.

There is evidence that labelling of students can have a powerful effect on teacher expectations and on student achievement and self-confidence. Our research on attainment grouping suggests that girls and young people from black and Asian backgrounds are more likely to be placed in a lower maths set than their test results would suggest, and than boys and white British students. There was a slightly different pattern for English, where boys and students from minority ethnic backgrounds were more likely to be placed in lower sets than predicted by their attainment. We found that the set pupils were placed in influenced their self-confidence, with the gap in self-confidence widening between pupils in high and low sets over their first two years of secondary school. We’ve also found evidence of a widening gap in attainment.

Adam Robbins, head of science and author of Middle Leadership Mastery

When it comes to removing bias from teacher assessments, I think it is a classic “is the juice worth the squeeze” situation. Heads of departments can essentially recreate a similar system to universities and the exam boards if they so wish. They can create tests in secret, and get students to sit in the exam venue, and collect the papers, and annonmyse them, and get them marked, and then moderate 10 per cent. The issue is time. If you were to do this for every essay that needed marking in a Year 11 English class, it would be completely untenable, but it would remove bias.

So we are left with deciding on a sliding scale how far we want to go. I think a big part of the answer is the question: what are the stakes of the work being marked? The lower the stakes, the less of an issue bias is. Therefore, if we assume mocks and terminal assessments are the highest stakes, then those assessments need systems in place to ensure the parity of results. Class-based assessments are lower down but still important. I’d recommend leaders arrange for these assessments to all have standardisation before marking, and then a very light-touch sampling. Then, at the lowest rung of the stakes ladder, you have work where there is no grade attached at all and you are marking formatively - these pieces need nothing at all.

Another way to tackle the issue of teacher bias is to look at the assessment design. When a teacher knows the class, they may unconsciously create an assessment that suits some students in the class more than others. Therefore, the most important thing is to have the assessments designed by people as far removed from the lessons as possible. Again, the higher the stakes of the assessment, the further removed they need to be. This distance can be temporal if you do not have the structural capacity. So, if the head of department teaches multiple Year 11 classes, then they can build the topic tests in the summer of the year before, so detailed knowledge of the questions diminishes.

Steve Dew, headteacher at Church Cowley St James Primary and director at Assess Progress

“Practically speaking, the only way to truly remove all bias is to mark, moderate and assess ‘unseen’; that is, to mark work where the authorship is unknown to the teacher - and this includes eliminating handwriting recognition bias.

“In schools, this can be done by cross-marking or moderating with a colleague or colleagues - but only if the school in question has enough teaching staff, with enough working hours to cross mark, or a neighbouring school which is friendly enough to oblige.

“[We implemented] a digital comparative judgement tool because I was looking for a way we could remove bias, and become more valid and reasoned moderators. It means learners’ work is moderated anonymously, and without bias, which is an obvious plus.

“However another benefit is it gives us a chance to mark work across other schools, and this has also helped our teachers. The benefit to teachers of being able to see what standards look like in other schools should not be underestimated.