Modern educational history was made last August.
A-level results were hit by the first fall in the proportion of candidates achieving the top grade for 21 years. Then a week later exactly the same thing happened with GCSEs. Twenty-four years of continuous improvement ground to a halt as the proportions of entries gaining A*, A*A and A*-C grades all dropped.
Suddenly, the era of steadily improving grades was over. This was no coincidence or statistical blip. Nor was it any reflection of a sudden dip in the quality of teaching and learning. This was a dramatic change in direction that can be expected to continue. It came as a direct result of a concerted attempt to get a grip on "standards" from exams regulator Ofqual, acting partly at the behest of ministers.
The change has huge implications for anyone working in secondary schools, and for students, parents, employers and society as a whole. Yet it did not appear in any party manifesto and its introduction was not discussed by Parliament. It was brought in without public debate, with minimal consultation, little publicity and virtually no awareness among the population at large.
As far as the government is concerned - and anyone else worried about the devaluation of qualifications over the past two decades - the outcome of this crackdown on "grade inflation" has been an unquestionably good thing.
But for heads and teachers there is an alarming new reality. They have jobs that depend on their schools meeting steadily more demanding "floor" targets for GCSEs: targets that are in turn based on the assumption that overall results will continue to rise. They face Ofsted inspections where exam results and the expectation of continuous improvement now play a hugely important role in deciding inspectors' overall verdicts.
Now these teachers abruptly find themselves working in a very different climate, in which the annual overall improvement in results has vanished.
Ofqual recognised the implications of its new regime in a warning letter sent to education secretary Michael Gove last August, just after GCSE results were published.
"In past years, we saw year-on-year increases in national exam results," Glenys Stacey, chief regulator, wrote. "Our (new) approach means that while some schools will see improvement in their exam results, due to comparable outcomes the overall results will not show significant increases.
"So it will be difficult to secure system-level improvements in exam results, which you have said you want to see. And we know that many in the education sector are concerned about this."
Concerned is an understatement. "Gove and Ofqual have to have it drummed into their stubborn heads that, actually, pupils are doing better than in previous years," one GCSE English teacher recently wrote on the TES online forum.
"Teaching has improved. Resources have improved. Pressure on schools to perform has increased. Pupils (in my experience) are now working harder than ever, and are aware of the importance of their grades. If more pupils were not getting a C there would be something seriously wrong.
"How is it that when we win more Olympic gold medals, the athletes receive OBEs and the coaches are praised for their preparation, but when schools improve their results we are accused of cheating, or are told that the exam was too easy?"
In theory, the "comparable outcomes" approach adopted by Ofqual should not present a problem. It does not use a fixed quota system to allocate grades and is supposed to give credit for actual student performance. So a real improvement in that performance should lead to better overall exam results.
In practice it is very difficult to see how Ofqual's system would pick up and acknowledge such an improvement. Surprisingly, Stacey's letter to Gove in August seems to recognise this fundamental flaw, saying: "One consequence of this approach is that it can make it harder for any genuine increases in the performance of students to be fully reflected in the results."
And that is without taking into account the fact that Ofqual does not necessarily appear to observe its own rules for comparable outcomes. Correspondence released after the GCSE English controversy suggests that in practice a crude ban on any rise in grades can be imposed long before actual evidence of candidate performance is considered.
Today there is a real danger, as Stacey's letter acknowledges, of recognition of genuine improvements being treated as collateral damage in the war against grade inflation. That has huge implications for student motivation - particularly as GCSEs and A levels are about to be made significantly tougher. It also seems strange that a country trying to develop a "knowledge-based economy" would risk stunting its education with artificial limits on student achievement. Yet England has been moving in that direction, if it is not already there.
The balance shifts
Understanding why involves cutting through an awful lot of exam-related jargon. The situation cannot be traced back to a single decision made by a single person or organisation at any one time. It has evolved in a piecemeal fashion, which perhaps explains why an issue of such national importance has emerged with little attempt to explain it to the public.
Some critics would lay the blame at the door of Gove. But while ministers appear to have played their part in triggering the recent clampdown on rising grades, it is not that simple.
The roots of what schools are experiencing today go back at least a decade. In 2001-02, exam boards and regulators were discussing how to ensure standards remained constant as reformed A levels were sat for the first time.
It was recognised that candidates' performances could dip unfairly simply because they were in the first year of such a big change. So the decision was taken to prioritise "comparable outcomes" over "comparable performance". They would, in other words, ensure that students' grades or "outcomes" were not disadvantaged because they were unlucky enough to take their exams in a year of change. The boards would therefore sacrifice, to an extent, the idea that a particular level of performance should lead to exactly the same grade as it had done in previous years.
Those allocating grades would look beyond examiners' views of how a cohort of students had performed in that year's exams. They would also examine how a cohort ought to have performed according to their results from previous sets of exams. A-level grades could be benchmarked to a cohort's GCSE performances and GCSE grades to national curriculum test results.
That, in essence, is the approach that has come to be known as "comparable outcomes": combining examiners' judgements about actual performance in exams with statistical evidence, allowing comparisons with previous years, when deciding where grade boundaries should lie.
It should be stressed that this takes place at overall cohort rather than candidate level. An individual student's A-level chances would never be limited, for example, simply because they had performed badly in their GCSEs.
Over the past decade comparable outcomes has become common practice for exam boards, and not just when the qualifications system is being changed. But until very recently it remained a largely technical matter for those running the exams system that few outside this small group of experts either understood or were aware of.
It is also important to understand that comparable outcomes was used as a broad approach rather than a precise mathematical formula. The relative emphasis placed on a particular year's performance and statistical evidence of prior achievement could vary; there was a trade-off between the two.
It was a change in emphasis towards the statistics, encouraged by Ofqual, that persuaded Tim Oates to break cover and talk to TES about the issue in early 2010. As head of research at Cambridge Assessment - owners of the OCR exam board - Oates was concerned that there had been a "huge shift" in that direction over the previous two to three years.
In some subjects the regulator was "really pushing us towards 100 per cent reliance on statistics" and there was a danger of "grades in the wrong place" for some students, he warned. Oates spoke out because he wanted a public debate on the issue.
"If you are a young person and you are working really, really hard and you think that what happens on that exam paper really counts, it is quite wrong that the system behind the scenes doesn't actually pay much attention to what you have done," he told TES at the time.
But the open debate Oates had hoped to spark did not take place. By the time the term "comparable outcomes" finally appeared in a mainstream national newspaper in August 2012 things had moved on drastically. The whole approach had been toughened up and had become a firmly entrenched part of the exam system, with formalised rules laid out by Ofqual.
A policy document setting out those rules was seen by a national newspaper in the run-up to the publication of results, sparking that first "comparable outcomes" mention and a flurry of August headlines about exam boards being told to "fix results".
Professor Alan Smithers, from the University of Buckingham, argued that the policy amounted to "a new form of norm referencing" - the old approach to school exams that resulted in fixed percentages of candidates being awarded certain grades (see below).
In reality it had not gone that far. But it was a formal step away from pure "criterion referencing" - in which grades are awarded solely on the basis of candidates' performance in that year's exams - and that did not please schools.
Kevin Stannard, director of learning at the Girls' Day School Trust, declared days before the results were released that grade inflation was a "systemic feature of criterion-based exams".
"So if there isn't a record percentage of pupils getting top grades again this year, it suggests something quite disturbing: the system isn't so much broken as corrupt," he warned.
That kind of reaction perhaps explains the coyness of exam boards when they announced that GCSE and A-level grade "inflation" had indeed been halted. They put this down to changes in cohort. When journalists asked about the use of comparable outcomes they were told it was nothing new and had been used for years.
`Comparable outcomes' gains momentum
The boards' answer was technically correct but also misleading as the approach had only recently been formalised and given increased emphasis by Ofqual. According to Stacey, the watchdog "started using" comparable outcomes for AS levels only in 2010, with the approach applied to new A levels and GCSEs a year later.
By May 2012 Ofqual was boasting that the approach had already slowed down A-level grade inflation and exam boards would therefore "continue to prioritise comparable outcomes". They did so and later that summer overall grade inflation promptly disappeared.
TES understands that key members of Gove's team at the Department for Education had little, if any, knowledge of the technical clampdown on grade inflation that was under way until after the results were published last summer and the controversy over GCSE English grades blew up.
However, TES has been told that when Stacey was appointed to the Ofqual job in 2011, the DfE made a point of emphasising to her the importance of "maintaining standards".
So while a strict application of comparable outcomes may have been enough to end grade inflation on its own, there was also government pressure to do something about it. Within the space of a year Stacey went from dismissing grade inflation as an unhelpful term to declaring that it had undermined confidence in GCSEs and A levels for at least a decade.
But none of that should have mattered as far as recognising students' achievements was concerned because Ofqual's rules on comparable outcomes clearly state that the technique should be used only when there has been "no substantial improvement in the quality of teaching and learning". So if teaching and learning has genuinely improved then comparable outcomes will not be used and grades will be allowed to rise.
That is the theory anyway. In practice things played out very differently last summer. Correspondence released in the wake of the GCSE English grading scandal shows that Ofqual issued a blanket ban on grades rising in the subject nearly a full month before it was told by examiners that there was "compelling evidence" suggesting they should rise.
Emails also reveal that the regulator was warned more than five months before results day that there were problems with the statistical predictions being used to calculate comparable outcomes for the subject. That was perhaps unsurprising, as they were based on the results of national tests that GCSE candidates had taken five years earlier while they were still in primary school.
Three boards eventually deemed the statistical approach unreliable and four out of the five exam boards had difficulties producing GCSE English results that satisfied Ofqual's demands.
More fundamentally, it is unclear how exam boards or others were supposed to demonstrate that there had been "substantial improvement in the quality of teaching and learning".
When asked this question by TES, Stacey admitted it was "a real test of comparable outcomes". "If there is a significant increase in achievement can comparable outcomes recognise it?" she said.
Stacey points to examples from 2012 where grades were allowed to rise in line with examiners' judgements.
"I come back to the fact that in 37 GCSEs last year they did go above tolerance (the limits to grade increases set by Ofqual). The sort of evidence you are looking for there is if examiners are saying that the standard is at (a higher) level. That is very persuasive."
But the GCSE English affair shows that Ofqual has been perfectly prepared to overrule this evidence and, indeed, insist that grades should not rise before it has even received examiners' professional judgements.
Moreover, as far as the High Court judges who ruled on schools' legal challenge to the GCSE English grades are concerned, nothing will ever be good enough to justify improvements in results.
Their judgment notes that for comparable outcomes to apply, there must be "no substantial improvement or drop in the quality of teaching . or learning at a national level". "For a core subject like English, where a large number of candidates each year take the examination and teaching methods vary little year on year, these conditions will in practice be met," it concludes.
So that's it then. There's no hope. Gove's ambition that better teaching will lead to more students passing his new demanding GCSEs is for the birds?
Maybe. But there are also rays of optimism. Stacey does not quite sign up to the judges' view. In "big subjects" such as English "you generally don't see seismic changes in achievement from one year to the next", she says. "But it doesn't mean that you won't get little step changes. We need to be sure you can recognise those if and when they happen."
The question, though, is how can you recognise them? Since examiners' judgements are so easily overruled, what other measure is there that can indicate when rises in grades are justified?
The answer could lie in the national sample tests planned as an independent gauge of secondary school standards. Stacey describes the proposal, made by the government in February after lobbying from Ofqual, as "a joy to my ears". And she tells TES that she thinks it has the "potential" to be used to check exam standards as part of the comparable outcomes process.
"We have got a win-win going on there if it is designed well," she says. "But it is early days." And, as ever, the devil will be in the detail.
In the meantime, with comparable outcomes here to stay, England's education system can be expected to enter a period of stagnation. Grades may hold their value but, without any independent measure of standards, the chances of more students achieving them has been drastically reduced. If the accountability system is not quickly changed to reflect that fact, the consequences for many state schools appear bleak.
A return to the norm
In the run-up to the last general election, when a Conservative majority still seemed possible, the party's education team turned their attention to working out how they could restore the "gold standard" to school exams.
And for a brief period, TES understands, ministers-to-be considered a beautifully simple, tried-and-tested solution to what they saw as years of damaging A-level and GCSE grade inflation.
They could end this devaluation at a stroke, it was suggested, by returning to a system that operated successfully in England's O and A levels for nearly a quarter of a century until 1987. Known as "norm referencing", it ensured that grade inflation would never be a problem because it guaranteed that the same, fixed percentages of students achieved the same grades every year. Every year the top 10 per cent of candidates would receive an A, and the next 15 per cent a B and so on.
But while the idea may have been simple, it was also crude and would have placed a cap on school and student improvement. Standards, as measured by exam results, would have been instantly frozen in time - not an ideal fit for a party that is today championing the "Aspiration nation".
So the Conservatives swiftly dropped any serious consideration of a widespread return to norm referencing.
In power as education secretary, Michael Gove has floated the idea of using the approach on a very limited basis when awarding A* grades at A level. But his focus as far as restoring the gold standard is concerned has been on toughening up exams rather than rationing grades.
"I am opposed to norm-referencing," Gove said when discussing his GCSE reform plans earlier this year.
He made it clear that he wants the new, more demanding qualifications to be able to recognise the rising standards he expects to come from improved teaching.
Photo credit: Corbis