Only a third of examiners are able to grade a paper correctly in a blind test, according to new research.
Cambridge Assessment asked examiners to grade a series of 400 history and physics GCSE papers that had previously been deemed to fall between grades A and C, and note how confident they felt about their judgment. All marks had been removed from the scripts and the examiners were given no indication what standard they were.
Usually, examiners will be able to see the marks that a candidate has been given before deciding which grade to award. Unlike the actual grading process, where examiners are able to talk to each other about grade boundaries, these examiners were not allowed to consult anyone.
Only 40 per cent of examiners' judgments matched the original grades in the history papers. For physics, on average only a quarter of the examiners' awards were as the originals. Only one in five grade A physics papers was marked correctly.
"There seems to be a discrepancy between the standard as set at the award meeting and the judgment of the quality of work by the individual judges," said the researchers.
A spokeswoman for the OCR exam board, which is owned by Cambridge Assessment, questioned whether the findings are a cause for concern. Examiners judging for real will always be able to see the actual marks awarded to candidates, she pointed out. "There is no doubt that within a mark range awarders are able to differentiate scripts very easily and accurately," she said.
In fact, while the research examiners found it slightly easier to rank scripts in relative order of quality, many misjudged which of two scripts was worth more marks.
They found it slightly easier to rank physics papers accurately than those in history. The researchers found this unsurprising as there is often less ambiguity between right and wrong answers in physics.
But Derek Bell, chief executive of the Association for Science Education, believes that physics can be as difficult to grade as history.
"In order to have common agreement about how the grading system is interpreted, you have to agree the standard you're looking for," he said. "If you don't have that agreement, you're going to have variation. Only through moderation is marker variation minimised."
When examiners were very confident about grading a paper, their decisions tended to be more accurate. The history examiners were more confident about their judgments than the physics examiners, with good reason, it transpired.
But Sylvia Green, director of research at Cambridge Assessment, pointed out that the research conditions varied considerably from the way in which grades are awarded in actual exams.