Last month Ofqual published new research on marking consistency in public exams.
The research provoked headlines suggesting that up to 40 per cent of grades that are being awarded are “wrong”.
But what did the research actually say, and is that claim correct?
Here’s everything you need to know.
What was the research?
Ofqual published a new paper on marking consistency metrics last month.
The research tried to assess the marking consistency in GCSEs, AS and A-levels using data derived from “seed” questions.
So what’s a seed question?
“Seeding” is a method used by exam boards to monitor and quality-assure examiners’ marking.
Before marking begins en masse, the exam boards select a number of answers drawn from real pupil scripts. A “definitive” mark is set for these answers (usually by one or more senior examiners), which is seen as the most appropriate mark for the response.
When rank-and-file examiners are let loose to complete their marking – which they do using a computer – these answers are randomly and invisibly dropped (or “seeded”) into their on-screen marking. If the examiner gives a mark to the seed which is significantly out of kilter (a certain tolerance is allowed) with the definitive mark, then they may be given extra guidance or retraining, or stopped from marking altogether.
Ofqual’s research looked at how examiners’ marks for the seed answers differed from the definitive mark, using data for GCSEs, AS and A levels gathered from the 2017 exam season. Ofqual was then able to use complex statistics to estimate the probability of candidates receiving the “definitive grade” (the grade they would get assuming they were awarded the definitive mark for every answer).
What did Ofqual find?
According to Ofqual, “the median probability of receiving the definitive qualification grade varies by qualification and subject” – on a scale where 1 is 100 per cent certain, and 0.1 is a 10 per cent probability of that happening.
Unsurprisingly, for some subjects the probability of achieving the definitive grade was extremely high (the average probability in maths was 0.96). But in other subjects with essay-style questions requiring longer responses, where there is inevitably a subjective element, it was much lower – for English language and literature the probability was 0.52.
That sounds really worrying - does it mean that nearly half of grades handed out in English are wrong?
Not according to Ofqual. The regulator says that the definitive mark should be seen as a theoretical construct used for a research exercise – it isn't necessarily the sole "right" mark, and a different mark is not automatically “incorrect” or “wrong”.
In some subjects and for some questions, it might be the case that only the definitive mark is right (for example, for a maths question where there is only one correct answer). But in other subjects and for other questions, it is perfectly possible that a different mark (within a certain tolerance of the definitive mark) could also be justified – so a sociology question might have 25 marks available, and an answer could have a definitive mark of 18 – but examiners might be able to justify giving it 19 or 17.
The point, Ofqual says, is that its analysis doesn’t distinguish between unacceptable error and justifiable inconsistency.
So should teachers be concerned?
Despite the caveats, many people will still be perturbed by the data. Sceptics might think that Ofqual is trying to have its cake and eat it by using the definitive mark to measure consistency, but then saying other marks could also be legitimate when the findings look uncomfortable.
The Headmasters' and Headmistresses' Conference said the "extreme" unreliability in humanities subjects raised "grave" implications.
Ofqual points out that the probability of receiving a grade within one grade of the definitive grade is much higher – it was above 0.95 for all the qualifications Ofqual looked at. But the high-stakes nature of our exam system means this will provide cold comfort for a student whose place at college depends on whether they get a 3 or 4 in GCSE English.
It’s important, though, to remember that consistency cuts both ways – while some pupils who arguably should have got a 4 will miss out on this, others who arguably should have got a 3 will get a 4.
Ofqual says that marking consistency between 2013 and 2017 has been stable and its analysis suggests that marking consistency in England is "not dissimilar” to that seen in other countries.
Why don’t we just move to raw marks?
One way of addressing these concerns would be to scrap grade boundaries altogether and just award pupils raw marks. However, that would arguably sacrifice the immediacy and comprehensibility of a grade.
And even if grade boundaries were abolished, colleges, universities and employers might end up drawing cut-off lines of their own.
Are there other solutions?
Ofqual says there is still room for boards to “incrementally” improve marking consistency.
However, no matter how much training examiners receive or how comprehensive a mark scheme is, we’ll never get examiners to absolutely agree on every mark.