My conclusion is that exams are confusing

Schools aren’t teaching to the test out of malice, but because too few people understand what assessments are meant to measure

19th December 2014, 12:00am

Daisy Christodoulou

The debate about exams often polarises around two views: one is that exams are inimical to the spirit of education, cannot measure its outputs and should not be used to judge children or schools. The other is that exams are the only way to improve a profession frightened of the cold blast of accountability.

I would like to propose an alternative point of view: that exams offer valuable information but the way they are currently used is hugely distorting. I would argue that one of the main reasons this has happened is because our test-based accountability system has incentivised teaching that doesn’t actually lead to genuine educational gains, resulting in methods that can loosely be defined as “teaching to the test”.

Daniel Koretz, a professor of assessment at Harvard University in the US, has written extensively about this problem. He monitored some of the changes in school behaviour after the introduction of the No Child Left Behind Act in 2001, a law that introduced systems of test-based accountability similar to the ones we have in England. One of Koretz’s main findings was that “test scores reflect a small sample of behaviour and are valuable only insofar as they support conclusions about the larger domains of interest”.

In explaining the difference between test scores and “larger domains of interest”, Koretz uses opinion polls as an analogy. These polls ask a sample of people about their views on an issue and make assumptions about the views of the entire population based on this. Likewise, tests sample a small part of what a pupil knows and can do, and the results are intended to indicate all of what she knows and can do. This means that the test scores are not actually important in and of themselves. They only matter in that they allow you to make a judgement about the entire domain.

This point may seem obvious but too many school improvement strategies do not take it into account. The sample and the domain are too often conflated - improvements on the exam sample, however they have been achieved, are automatically seen as improvements in the domain. But if instruction has been geared towards the types of questions found on the test, that inevitably weakens the inferences we can make about performance in the wider domain, which is the only thing about the test score that actually matters.

Compare and contrast

For Koretz, the “acid test” is whether gains can be generalised to other assessments. Consider these two approaches to teaching. One way is to spend a portion of class time studying material that is not on the exam syllabus, as a means of deepening understanding. Another is to spend the same portion of class time analysing past papers and mark schemes to understand the typical recurring questions and the best strategies for answering them. Both approaches will probably lead to an improvement in exam scores, but the latter will lead to improvement that will only show up on that particular exam. If we rely on this approach, test scores will go up but they won’t mean what we want them to mean.

What evidence is there that schools are teaching to the sample rather than the domain? Koretz has undertaken research in the US that found performance on high-stakes exams was not sustained when pupils were given different assessments on the same domain. In England, there is no precisely similar research. However, we have seen that rising test scores in high-stakes exams at the end of key stages 2 and 4 have not been matched by better performance on other measures, such as the Programme for International Student Assessment (Pisa) and the Trends in International Maths and Science Study (Timss). A number of factors could explain this, but it is plausible to speculate that teaching to the test is one of them.

A possible solution is better test design. Narrow syllabuses and predictable papers increase the returns of teaching to the test. However, as Koretz is at pains to point out, tests will only ever be samples, and there is no such thing as the perfect test to teach to.

Another idea is for exam boards to do more to establish the worth of their tests, analysing whether performance on their qualifications really reflects performance on the domain, or just on the test sample.

The government may argue that schools have deliberately set out to game the system and that it is perfectly possible to achieve good exam results through high-quality teaching. Schools could respond that the government and exam boards have often praised and even promoted these strategies, and that high-stakes exams make teaching to the test inevitable. I would argue that in both cases the problems have come about less through deliberate cynicism than a misunderstanding of educational assessment. If you confuse the sample and the domain, teaching to the test will be welcomed rather than frowned upon.

The biggest improvements might perhaps be gained through better training in some of the concepts of educational assessment, in particular this fundamental point about the sample and domain. This would allow policymakers to devise better systems and teachers to critique them from a position of strength. And if those on either side of the divide had a shared understanding of these concepts and problems, it could even restore some of the trust that has been lost.

Daisy Christodoulou is an educationalist and research and development director at UK academy chain Ark. Her book Seven Myths About Education is published by Routledge