The staff at Gloucester Road Primary School in Cheltenham knew there was something wrong with the way that they were collecting data.
The government had abolished the old national curriculum “levels” as a way of recording pupil progress. But the new data system that the school was using to record progress still spoke the same language. And that language was largely incomprehensible.
“I heard one of my staff say, ‘That’s a Year 3 Emerging,’ and I thought, ‘This is still that arbitrary labelling of children.’ It was box-ticking,” explains headteacher Gayle Fletcher.
So the school sought out a more adaptable system that worked to improve outcomes, rather than attempt to define them.
“[The new system] shows me what’s been taught, at what depth, and then we focus on what gaps need plugging,” Fletcher says. “The system is the servant, not the master. It is there in the background doing what we’ve asked it to.”
Fletcher now thinks that the school has finally got things the right way around as far as data in concerned.
“We use our data to improve our children’s learning,” she explains. “We are not using the children to improve the data.”
Gloucester Road is a happy exception to a depressing rule. It is estimated that three-quarters of England’s primaries are still pointlessly persisting with assessment systems that emulate the old levels, ignore the aims of the current national curriculum and fail to give accurate feedback that can be used to improve student outcomes.
This, of course, is only the start of the problems with dysfunctional data facing our schools today. And a reminder of just how damaging this can be came on Monday, when a cross-party committee of MPs warned that primary testing could harm pupil and teacher wellbeing (bit.ly/TestingWarnings).
The data isn't working
On the eve of Sats week and in the run-up to GCSEs, it is clear that what should be a boon to education is all too often a burden: the data is not working.
In too many cases the use of data actively works against what schools should be doing – providing a good all-round education for all pupils.
Hours are being spent this summer conducting assessments of primary pupils’ writing ability – assessments that both Ofsted and the Department for Education have admitted are not robust enough to base judgements on.
And Progress 8 – the most complicated headline secondary performance measure yet – allows the results of small numbers of pupils to distort the schools’ overall scores to the extent that heads have warned that some secondaries could deliberately “lose” pupils.
Meanwhile, there are increasing doubts about the reliability of the primary test results that provide the foundation for the Progress 8 system, with a growing acknowledgement within the system that under-pressure primaries are using every trick in the book to finesse them.
These are just the most obvious symptoms, the surface problems, of a schools system that has become so completely led by numbers, thresholds, percentages, performance measures and numerical benchmarks; so in thrall to data; that it knows no other way.
It is hard to imagine a time when our schools were not dominated by tables, targets and testing. But it wasn’t so long ago that what happened inside schools was regarded as a “secret garden”.
For her entire tenure as prime minister, Margaret Thatcher – who sowed the seeds of today’s school hyper-accountability – actually presided over a system in which no school performance data was published at all.
It wasn’t until 1991 that the first national primary Sats were held. And the first national school GCSE league tables based on official data did not follow until a year later in 1992, the same year that Ofsted was established.
Until then state schools had been the fiefdoms of teachers, with the only direct pressure coming from the local authorities responsible for them.
But when data reporting did come in, it was immediately linked to high-stakes accountability, with the potential to trigger sackings or school closures. Low results meant real consequences, regardless of any wider socioeconomic problems that schools might face.
And so, from the very beginning, the official data on our schools has always been vulnerable to distortion.
In secondaries, obscure vocational qualifications of dubious worth exploded in popularity, as they allowed schools to acquire league table scores worth the equivalent of four “good” GCSEs in a fraction of the teaching time.
At primary, rumours of schools being rather “flexible” with the truth and exam protocols have long been rife.
The benefits from the data have been questionable, too. For example, there is little evidence to back up the idea that performance data has made any impact on the link between disadvantage and poor secondary school results.
A glance down the list of the schools propping up the GCSE league tables still amounts to a tour of England’s most deprived communities.
Early signs of success
The use of national performance data in the primary sector initially appeared to have greater success.
Firstly, it shed unforgiving light on existing school standards. The first key stage 2 Sats, taken under the Conservatives in 1995, showed that less than half of pupils were reaching the “expected” standard in numeracy and literacy.
By 2000, under Labour, the use of primary data had helped to drive some seemingly impressive improvements.
But results plateaued – and not long afterwards parallel data that had not been subjected to the strain of high-stakes accountability raised serious questions over how real that progress had been (bit.ly/PrimaryProgress).
Yet those setbacks did not signal a retreat from the use of data in schools or from the persistent belief among those in power that it held the key to securing real improvements for pupils. If anything, the infatuation with data grew.
In 2007, the Department for Children, Schools and Families’ director-general for schools, Ralph Tabberer, invited Tes into Sanctuary Buildings to view the government’s latest attempt at evidence-based policy.
“The Bridge”, as it was known, amounted to an oversized computer screen showing a grid of tiny multi-coloured squares relating to about 15 performance indicators, covering everything from exam results to truancy levels.
Known by some as “the matrix”, it was crude by today’s standards, but included details for each of England’s 150 or so local authorities on colour-coded indicators: red, amber or green, according to the progress being made, or light blue, mauve or purple according to the level of risk.
A click would shift to a view of detailed figures for each of England’s 22,918 state schools, and then to individualised data on England’s 8.2 million pupils.
It was all possible because the tests and teacher assessments that took place throughout every pupil’s school career had given England what Tabberer proudly described as “probably the best data set on school performance in the world”.
The same approach to tracking pupil progress is now used by schools all over the country. But in order for it to work, the data it is based on must be reliable and comparable. And there are strong arguments that it is not.
The dangers of assuming that reliability exists were thrown into sharp relief by last year’s chaotic Sats season. Confusion engulfed the key stage 2 writing assessments – part of the floor standards that primaries must meet in order to avoid forced academisation (see “With the new system, there really is no comparison”, bit.ly/TesComparison).
These assessments are designed purely as performance measures for the primaries that conduct them. Yet Ofsted, the inspectorate that judges these schools, has already said it will advise its inspectors not to base any conclusions on the assessments.
'Not robust enough'
Even the Department for Education, which dreamed up the assessments, has tacitly admitted that they are not robust enough to justify interventions in schools where results are low.
And Rebecca Allen, director of Education Datalab, believes there may be much more deep-rooted problems with Sats than an initial muddle over the interpretation of last year’s new system.
“My concerns on KS2 are around the management of the tests themselves, particularly for the quite large numbers of students who have either readers or scribes,” she says. “I just hear too many stories that lead me to believe that the tests are not always being administered in a consistent way.”
Because only some schools employ these tactics, the ability to use the test results to accurately compare schools is therefore immediately lost.
And when taken on their own as measures of an individual pupil’s genuine progress, Sats also have limited usefulness, owing to the extensive test preparation that primaries have gone through to ensure that they get the right results. Many secondaries will end up conducting their own tests of newly arrived Year 7s to obtain an assessment that they can rely on.
At secondary, there have been problems, too. Governments do not always ignore problems with school performance data. But sometimes the solution makes things worse.
Contextual value added was an attempt to address one of the biggest flaws of the original school league tables – the fact that they take no account of the differences between pupils in ability and background and are, as often as not, merely a measure of a school’s intake rather than the quality of its teaching.
CVA tried to solve this by measuring pupils’ GCSE performance against their prior attainment and socioeconomic background. But this very complex league table indicator – which became a major factor in Ofsted judgements after its introduction in 2006 – had big flaws of its own.
Professor Stephen Gorard showed how the data used to compile CVA was riddled with errors that made it “useless” for comparing secondaries and no more than a “voodoo science”. He called for the end of the measure, concluding that it was a “nonsense” that would end up in a court case if the government continued to use it to judge schools.
CVA was abolished in 2011 by the coalition government. But this was because the government argued that it was “morally wrong to have an attainment measure that entrenches low aspirations for children because of their background”. It was unclear whether ministers also understood that the margins for statistical error in CVA rendered it meaningless.
Making the same 'mistake'
Today Gorard, now at Durham University, fears the same mistake is being made with Progress 8 – the latest attempt at a “fairer” secondary measure.
“I support many of the objectives underlying it, as I did with CVA, [but] I don’t think it’s ready to be used in public policy,” he says. “I wouldn’t like to see children’s lives, school’s futures, people’s careers, promotion prospects, etc, depend on something that at the moment we can’t be sure is working.”
He says that the error and missing data inevitably involved in measures like CVA and Progress 8, involving multiple indicators and hundreds of thousands of pupils, “accumulates like grit in a factory machine” and stays there, rendering the indicator meaningless.
For example, only 85 per cent of pupil records had complete sets of data for five of the crucial CVA variables – free school meals, pupils in care, special needs, gender and ethnicity.
With CVA, he says, the differences between school performance that it suggested – the “school effect” – were actually “made up almost entirely of the error component in the original figures”.
Duncan Baldwin, the Association of School and College Leaders’ deputy policy director, broadly supports Progress 8 but acknowledges “concerns” about the KS2 data that it is based on. “That data is derived from two days’ worth of a Year 6 pupil’s life,” he says.
Allen also has concerns: “I am really worried about us having accountability measures that are based on baselines that are of unknowable variability in how schools are categorising students.”
“I think the data is too fragile for us to make high-stakes judgements on schools,” she warns. “And that’s what we do and headteachers lose their jobs based on this data.”
Gorard points out that CVA and Progress 8 share another serious limitation: both compare a pupil’s progress relative to that achieved by other pupils in the same year, and are therefore relative rather than absolute measures.
“It is not possible, even if magically all schools did really well, for them all to make progress according to this measure,” he says. “Progress 8 is a zero-sum measure.”
It is impossible for all schools to get good Progress 8 scores because it is impossible for all schools to be above average.
And that knowledge can only increase the pressure on schools – and school leaders – to use every possible means to ensure they achieve a good score.
There have already been warnings that some schools may use tactics such as “off-rolling” vulnerable pupils (“Schools may try to ‘lose’ vulnerable pupils”, bit.ly/TesLosePupils)
A distortion of the data is not always the fault of high-stakes tests, though. The experience of the primary sector following the abolition of the old system of national curriculum levels suggests it is quite possible for schools to drown themselves in doubtful data, even when there’s no official demand for it.
Heads all over England are constructing elaborate internal performance data systems that do not feed into official accountability measures at all. School data consultant and Tes columnist James Pembroke says that in most cases they are measuring the wrong thing anyway and making unnecessary extra work for themselves.
He estimates that three-quarters of primaries today use pupil tracking systems that needlessly recreate levels by focusing on material covered in lessons, rather than the depth to which pupils have understood it.
“Levels were ditched because they were having a negative impact on learning – they labelled children and lowered aspirations,” he says. “[The government] said, ‘We will get rid of levels,’ they opened the cage and said [to schools], ‘Look, you’re free.’ And a lot of schools just stayed in the cage.”
'Look for an impact on learning'
Pembroke argues that schools record “too much” data and advises: “If it has no impact on learning, then don’t do it.”
He blames the local authorities and regional schools commissioners for “wasting time” by asking for unnecessary data.
“The great irony is that so many of these things are put in place supposedly for the purposes of school improvement,” he says. “But all of these things take time away from teaching. And the only way to improve a school is to teach kids.”
However, he credits Ofsted for recently adopting a more enlightened approach. And in other positive signs, the DfE appears to at least partially recognise the unreliability of some of the schools data it has created. KS2 writing assessments, for example, will not contribute to Progress 8.
But if there is a glimmer of light domestically, problems with international school data are only just beginning. The Programme for International Student Assessment (Pisa) has effectively introduced a global league table system to education that is already influencing policy, as national leaders bring in reforms aimed solely at climbing the Pisa table.
Pisa uses what data it has to maximum effect – producing hundreds and hundreds of pages of analysis and tables. But, as with national school performance data in England, questions are being raised over whether the international data is robust enough to support such wideranging conclusions.
In March, Tes was able to secure a belated admission from the officials running Pisa that significant drops in the maths results among the most-able pupils in the top-performing nations could actually just be down to a change in the tests used (bit.ly/PisaChanges).
So what’s the solution? Completely ditching data because it is flawed and going back to deciding schools policy on the basis of limited, expensive inspection evidence, combined with hunches, seems just as unsatisfactory.
As Allen says: “For those people who say, ‘We need to stop testing children at all,’ I guess my retort would be that we have been there.
“I went through the education in the 1980s and 1990s. Let’s not kid ourselves – I think there were far larger instances of poor quality teaching and dysfunctional schools failing entire communities.
“We are in a bit of a better place now and I think, in part, publication of data on performance has raised the quality of schooling at the bottom end.
“But it is a really tricky balancing act and we have to do the best job we can to create measures that don’t lead schools to have these perverse incentives.”
But is that really possible? As soon as you link pupil data to high-stakes accountability, you immediately create a huge incentive for schools to excel by any means necessary.
As Professor Rob Coe, director of the Centre for Evaluation and Monitoring at Durham University, told the Commons Education Select Committee this year: “The consequences influence the assessments.”
If education is to get the best from data then schools and teachers should not be made slaves to it. It should be used in conjunction with human judgement by people with the freedom to admit when the numbers get things wrong.
The problem is that the more time, money and importance that politicians and, on a micro level, school leaders invest in this data, the harder that becomes.
William Stewart is news editor of Tes. He tweets @wstewarttes