Are the GCSEs a reliable measure of education?

GCSEs dominate the education of young people, but are they the result of rigorous research or a hangover of tradition? Chris Parr tells you all you need to know

GCSE reliability

Why do GCSEs look and work the way they do? Let me tell you a story…

A child wakes on Christmas Day and helps her Dad cook the turkey for dinner.

“First you chop the turkey in half and put one half in the oven,” says Dad. “The other half is for Boxing Day and we will cook that tomorrow.”

The child thinks about it, then she asks: “Why don’t we cook the whole turkey today and save us a job tomorrow?”

The father considers this excellent point for a while. Eventually, he replies: “Well, that’s how your grandma told me to do it, so let’s ask her.”

They ask the question and Grandma answers matter-of-factly: “Well dear, when I was young, our oven wasn’t big enough to fit the whole turkey in.”

The point of the story is that routine – “we’ve always done it like that” – can often be the enemy of common sense.

Where there is an absence of evidence and research, ill-informed assumption and tradition will fill the void.

So how much are GCSEs a victim of tradition and assumption? Can you really trust them?


Related: Exam season: the unanswered questions

Listen: Tes discuss the three-year GCSE

The latest: Visit our GCSE hub for the latest news and opinion


When were the first GCSEs sat?

The first set of General Certificate of Education courses were introduced in 1951, and were divided into O levels for 16-year-olds, and A levels for 18-year-olds. Certificates of Secondary Education (CSE) were introduced from 1965 to cater for pupils deemed less academically able.

GCSEs

In 1988, O levels and CSEs were both replaced by GCSEs, and since then, every year, hundreds of thousands of 16-year-olds have gathered in examination halls to take their tests.

The qualifications have evolved, of course, but in general terms, the majority have been in a similar format: several papers with written questions, all stacked up in a few weeks, which are then marked and grades are allocated.

Why do the GCSEs look and work like they do?

“It is a lot to do with tradition – and yes, to an extent, this is just how things have evolved,” explains Rob Coe, former professor in the School of Education and former director of the Centre for Evaluation and Monitoring at Durham University.

“Although to be fair, if you look around the world where there are different traditions, or you look at other types of academic examination processes, they are mostly pretty similar to the kind of thing we do in GCSE.”

The first public examinations for schools were introduced in the mid-19th century, and were set by the universities of Cambridge and Oxford in response to requests from independent and grammar schools, which wanted to see at what level their pupils were working. They were sat by a minority of the student population, and only by boys.

According to Cambridge Assessment, exams back then were not too different to how they are now, with pupils sitting tests (mostly in local village halls) in “English language and literature, history, geography, geology, Greek, Latin, French, German, physical sciences, political economy and English law, zoology, mathematics, chemistry, arithmetic, drawing, music and religious knowledge (unless their parents objected)”.

Cambridge Assessment also details how the exams were unashamedly tests of factual knowledge: they were essentially a memorisation test.

memory

It might seem that, since then, we have not come that far in terms of the tests themselves – that we are still cooking half the turkey – but Jo-Anne Baird, professor of educational assessment and director of the Department of Education at the University of Oxford, believes such an assumption would be unfair.

“I think people would be surprised at the amount of research that goes into designing GCSEs and into how they operate in practice,” she says.

“For decades, academics have been publishing on these topics, but exam boards and [exams regulator] Ofqual also conduct a large volume of work on a range of issues; a lot of careful thought goes into the examinations in this country.

“It is worth noting that GCSE-style exams, with their extended essays and short-answer questions, are often looked at as very innovative by assessment researchers around the world.”

While you might think the exams are just simple question-and-answer formats, GCSEs actually contain a range of assessment techniques (see the sections below for more on question choices), and Baird believes that part of their strength is how they have been able to subtly adapt, to “broaden the curriculum, since they were first introduced”.  

That said, she believes the over-reliance on exams as a test of education is becoming more problematic.

“Not everything can be assessed easily by formal examining techniques,” she says. “The context of assessment has changed over time with the introduction of performance tables and higher pressure upon young people to attain qualifications.”

The increasingly high-stakes nature of GCSE examination, and the impact this has on both teachers and pupils, is an area where some feel the research has some catching up to do (see further exploration of wellbeing issues later in this article).

Has it always been a straight fight between linear and modular exams?

There have, of course, been structural changes to GCSEs – most recently in 2017, when a new grading system was introduced and the amount of coursework was reduced.

Perhaps more significantly, from 2012, the rules governing GCSEs changed, meaning that modular courses, which had been commonplace since 2009 and see students examined on individual units of learning as they go, were replaced by linear qualifications, with all examination taking place at the end.

fighting

Speaking in 2011, schools minister Nick Gibb said that he wanted to “break the constant treadmill of exams and retakes throughout students’ GCSE courses”.

“School shouldn’t be a dreary trudge from one test to the next,” he said. “Sitting and passing modules has become the be-all and end-all, instead of achieving a real, lasting understanding and love of a subject. Students shouldn’t be continually cramming to pass the next exam or re-sitting the same test again and again simply to boost their mark – then forgetting it all by moving onto the next module immediately.”

When they were first introduced in the late 1980s, the vast majority of GCSEs were not modular, with science courses being the exception.

Before 2009, only sciences, modern foreign languages and maths were available as modular GCSE courses.

To what extent, though, was the move back to linearity based on solid research? And do we really know, for example, whether it is better to assess knowledge in smaller, more frequent exams or in one larger paper at the end of a course?

Michelle Meadows is executive director of strategy, risk and research at Ofqual, which designs and regulates GCSE examinations in England.

“I think Ofqual's work is incredibly research-based, but there are still areas where, of course, we want to know more,” she says.

“We looked at modular versus linear GCSEs, for example, which is quite an important structural question. You would think, given that these exams have been around since 1988, there would be a really good evidence base on the impact of these two approaches on teaching and learning standards.

"What we found, though, was that this was actually an area where the evidence was weaker, and we sought to fill that gap by doing an evaluation that looked at the issue from a range of perspectives.”

GCSEs

The research, Examination Reform: Impact of Linear and Modular Examinations at GCSE, was carried out by Ofqual and the University of Oxford’s Centre for Educational Assessment, and was published in April this year. It looked at the effects of structural reform on grading outcomes, teachers’ changes of practice, and the impact of the reform for different groups of pupils.

“We looked at the impact of modular versus linear on different social groups – so does it impact disadvantaged students, does it impact on boys more than girls? – and the answer to those questions was no, which is fascinating,” says Meadows.

"We also looked at things like the impact of linear and modular structures on standard setting, and the economic impact – the cost of running these qualifications.”

As part of the project, Ofqual also did a review of existing literature on the two assessment approaches.

“We were expecting to have a really big research literature to draw on, but actually, it was pretty poor, and most of what was there were qualitative studies where people had collected the views and opinions of various groups rather than hard data,” says Meadows.

Her team concluded that GCSEs “are probably too small to be modular”. (You can read and listen to Professor Baird's view here)

Do we know whether the questions are actually effective for assessment?

One of the repercussions of a more linear approach to GCSE assessment is a pressing need to ensure that the exam questions themselves are up to the task. Tweaking the system to one that encourages more high-stakes testing at the end of a course must surely influence the type of questions that students should be asked?

GCSEsCoe is not convinced that this has been fully acknowledged.

“I think that how exams have been designed, and the typical ways in which questions are written and mark schemes are written, come from an age when exams did not have such high stakes,” he says. “Now that we do have such high stakes, and the curriculum is so driven by GCSE examination content and style, then we could do better with the way we frame those questions.”

Indeed, he has some strong concerns about the way that GCSE questions are drawn up.

“The reality is, the type of people who write GCSE exams, they are in a kind of cottage industry, really,” Coe says. “They are good people, they are well-intentioned and many have been doing it a long time, but they are not part of a core of expertise with a proper training model.”

He estimates that there are “a couple of thousand” people working on producing exam papers for GCSEs in England, meaning it is “not a well-focused activity”.

Two years ago, he did some basic assessment theory training for senior examiners at one exam board, and says he encountered problematic gaps in their knowledge.

“We were doing stuff about how you write questions, and much of it was well known to them, but quite a lot of it wasn't to a lot of people. Pretty basic stuff, too, about how you actually make sure a question does in fact ask the things you want it to ask, and how you make sure the mark scheme works – things like that,” says Coe. 

GCSEs

Ofqual declined to comment on this specific point. But Tim Oates, group director of assessment, research and development at Cambridge Assessment, which operates the OCR examination board, disagrees with Coe.

“We do so much research on the quality of what we call assessment items, but essentially the questions,” he says. “We want good questions, which tap deeply into the knowledge and understanding which is required, and predict later performance.

“That's important because GCSE should be a good predictor of performance in the same subject at A level, and A level should be a good predictor of performance in the next stage of higher education.”

He believes that, by and large, the teams that develop questions in England are producing assessments that “contain the kind of stuff essential for learning in the subject, and for progressing to the next stage of learning”.

He agrees, though, that there are some “issues” with consistency of marking – even where the questions themselves have been well formulated.

“In some subjects – history is a good case…English literature, psychology – there can be quite substantial errors in interpretation [by the student]. You want the questions to be able to give adequate space for them to give creative answers, which synthesise understanding from different parts of the discipline. But that can give rise to problems of consistency of marking, so you constantly have to balance those things.”

GCSEsExam boards, he says, are very aware of that and “try to balance them appropriately”.

“We monitor it statistically, we look in great qualitative detail at the nature of the responses that the students are giving; we'd like to know that when you look at the scripts awarded the top grades, you think ‘that's the kind of thing that should get you a top grade’. And we take a lot of post-examination scrutiny to make sure that that is the case.”

Do we know if grouping exams at the end of a year is the best option for learning?

The literature review carried out during the Ofqual/Oxford research identified a number of papers that help to explain why examinations are so often set up with assessment at the end.

“A perennial concern is that examinations in general produce short-term learning goals, which induces instrumental motivation in students, and thwarts deep learning and long-term retention,” the report concludes. “These effects are considered to be more severe with modular examinations.”

It adds, though, that modularisation can allow students to master a topic before moving onto the next topic, and are better aligned with a “testing when ready” philosophy.

GCSESome 20 papers were looked at for the study, with the researchers concluding that, while modular courses allowed students to master topics before moving onto the next, they offered less subject coherence overall.

Linear courses, meanwhile, were found to offer better long-term retention of information; foster depth of learning; create better development of subject-specific skills; and ultimately lead to a better understanding of the subject in question.

However, one concept that the research literature does identify as a potentially negative effect of high-stakes external testing is the “washback” effect: the extent to which the exams themselves dictate teaching and learning, rather than the pursuit of knowledge.

As a result, the need to design GCSE questions that would always favour those with a thorough understanding of a topic above those who had been “taught to the test” is a big area of concern.

“In many cases, GCSE papers can be quite predictable, so teachers teach particular types of questions,” says Coe. “So we need to make it more variable and less predictable.

“Of course, everyone will hate that – nobody likes it when questions come up that are unpredictable. In fact, there are some well-documented instances…where a question has come up and people were outraged.”

The concern, though, is that if questions are not unpredictable, GCSE teaching becomes “a bit of a formula...and sometimes you get the feeling that somebody who really knew a lot about a topic wouldn't do really well on the exam”, Coe concludes.

Oates agrees that the urge to focus purely on exam performance is one of the risks of the way in which GCSEs are structured, but says the evidence is clear that it is not the best way to achieve the largest number of high grades.

GCSE“You can get highly restricted learning programmes in schools and colleges; that is a danger of people working very narrowly to their perception of the requirements of the exam,” he says. “But the best way to get a high grade in any GCSE right now is to teach the whole of the domain, the whole of the specification, in a rich and deep fashion. If you do that, you get a good grade.”

GCSE predictability and its impact on washback are areas that Meadows has studied closely, both at Ofqual and in her earlier academic career.

“We distinguish in research between good predictability and bad predictability,” she explains. “With good predictability, what you want is a student to come into the exam hall feeling they know broadly what is going to be expected of them. They know the kinds of question they are likely to be asked, but not the exact question they are going to get asked.”

Good assessment, she says, “shouldn’t put people in a situation where they can’t show what they can do”.

“There might be some scenarios where it would be good to throw people off their guard, but that is not the purpose of GCSEs," Meadows says. "With GCSEs, we want people to come in and know broadly what is expected.”

Bad predictability, however, which can result in so-called teaching to the test, “undermines validity”.

“There is research literature on this, and in our regulation we are really keen to strike that balance between good and bad predictability,” she says.

“Exam boards have to have a sampling plan, so if you have all the questions you could possibly ask about GCSE history, for example, they have to have a plan about how they are going to sample that content over time to allow valid assessment that isn’t unduly predictable. That approach that we take is very much informed by the research.”

GCSE sampling

Does grouping exams at the end of school increase anxiety and stress?

While GCSE exams should not be designed in a way that aims to deliberately catch pupils out, it is nonetheless likely that they will affect their mental wellbeing to some extent. Examinations of any kind, and high-stakes end-of-course assessments in particular, are inherently stressful.

To what extent, then, does the GCSE exam period – which can see pupils sitting demanding examinations in nine or 10 subjects in a confined period of time – compound this?

David Putwain works in the School of Education at Liverpool John Moores University, and his research looks at how psychological factors influence, and in turn are influenced by, learning and achievement.

“When you look at stress, anxiety or wellbeing, you have a balance between the assessment type and the student,” he says.

While some students will be good at “reappraising” or thinking stressful situations through in a way that allows them to balance their emotions and see different perspectives, others will be less able to do so.

“Some people kind of go for a more "head in the sand" type approach or they just kind of get caught up in the stress…and the more they ruminate on it, the more they worry about it, and they are unhelpful ways of dealing with it.”

GCSE stressStress itself is not necessarily a bad thing, Putwain says.

“GCSEs are really pressured, high-stakes exams,” he explains. “How you perform can determine a lot of access you have in the future – whether you get into college to do an apprenticeship or study A levels.

“That pressure for some people is a really good thing. For some people, pressure is a real motivator, and you know, it gives them that proverbial kick up the backside.”

For others, though, the stress can be debilitating – and having such a high concentration of examinations in one relatively short period of time can exacerbate the problem for these students, Putwain says.

“There are big individual differences in how people respond to stress, and that makes it difficult to try and pinpoint factors to do with the actual assessment type,” he explains.

“But the more high-stakes exams become, the harder they become. And the more content you have to cram in, the more cognitive demands that the exams make, the more likely you are to swing towards that negative anxiety reaction rather than the positive, motivating stress.”

Oates says that Cambridge Assessment has looked in detail at the effects of a concentrated exam season on pupil wellbeing.

“There are merits of having a continuous assessment process where people get assessments out of the way, so they don't face the summer season, which is heavily dominated by examinations,” he says.

“We think a certain level of stress is important, though, because it enhances performance – we know that from the research. Too much stress and your performance drops off, so this is about having adequate preparation for your examinations.”

GCSE preparation, Oates believes, should be taking place throughout the “whole of secondary schooling”.

“If you treat the examination as something where you have an opportunity to demonstrate what you know, as opposed to a very, very high-stakes single performance in which the school's future as well as your own is at stake, then that's when the stress can become too high, and is not being managed well by the school or the family, and can adversely affect people's performance,” he says.

How do others do it?

It is often said that pupils in England sit more exams than others globally, and that the country is something of an anomaly on large-scale testing of 16-year-olds.

“People say, oh, we're the most tested system in the world, but it is not true,” Oates insists. “They say only we have got high-stakes examinations at 16, but that's not true either."

GCSE myths

He commissioned a piece of work called GCSE Red Herrings, which he says “showed…what we knew already, which was that all developed countries have high-stakes assessment at the age of 16”.

“Most of it is teacher assessment around the world,” he adds. “But we know that teacher assessment is not as fair as external examinations. We know that some kids have really dysfunctional relationships with their teachers, and that is really stressful.”

The risk, he says, is that teachers don't always recognise what pupils can do.

Sweden is one country where far more emphasis is placed on teacher assessment rather than examination. While there are some mandatory national subject tests towards the end of upper secondary schooling, Swedish pupils receive a graded “learning certificate” that is also based on course and project grades, coursework, and, quite significantly, teacher assessment.

Its approach has attracted much praise. Sweden has one of the most egalitarian education systems in the world; it is perceived to be more child-centred and less driven by targets; and empowers teachers by giving them more sway over assessment outcomes.

However, a 2016 report from the Organisation for Economic Co-operation and Development raised concerns about consistency of assessment as a result of the emphasis on teacher-led grading.

“The current reporting of outcomes in Year 9 at the end of compulsory school and at the upper secondary school heavily relies on the reliability of the grades awarded by teachers,” it found. “An area of concern is equivalence of student grades (reliability) across schools.”

Oates says that Sweden’s reliance on teacher assessment over the past 20 years has resulted in “rampant grade inflation”.

“We know this is the case because Swedish economists have done the work. The line of the grades given by teachers to students has been going up extremely quickly – there has been a very sharp upward line, consistent over those 20 years,” he notes.

grade inflation

However, while domestic results have rocketed, standards, as assessed by international studies such as the Programme for International Student Assessment and Trends in International Mathematics and Science Study have indicated a deteriorating performance in recent years. In fact, the results are “diametrically opposed” to Sweden’s national findings, Oates points out.

“Now Sweden knows that it has a major crisis on its hands,” he says. “It has not maintained standards in the education system.”

So how reliable are GCSEs?

There is little doubt that tradition has had a huge influence on the structure and assessment techniques seen in modern-day GCSEs. What is less clear is the extent to which academic research has led to a more effective examination system.

There is certainly evidence to suggest that the linear, high-stakes approach can have some positive effects on learning; but equally, the literature suggests it can have a very limiting impact on teaching and can also place huge mental pressure on the pupils sitting the exams.

Overall, though, Coe believes the system is a good one. 

“I think it is very easy to look at a country’s system, and be aware of all its shortcomings,” he concludes. “But I do think, fundamentally, the GCSE system is not broken – it is a good system, certainly as good as any that we could easily change to.”

Chris Parr is a freelance writer

Log in or register for FREE to continue reading.

It only takes a moment and you'll get access to more news, plus courses, jobs and teaching resources tailored to you