Since the Programme for International Student Assessment (Pisa) was carried out for the first time in 2000, it has come to dominate education policy worldwide. Every three years, as new scores are released, huge media attention gives way to naming and shaming of countries that perform surprisingly well or poorly in the league table.
As the frenzy has taken hold, policymakers have begun benchmarking the success and failings of education systems against changes in Pisa scores over time. For example, in 2012, Polish education policy was catapulted to stardom after the country was shown to have continuously improved its scores since the first round. Conversely, Sweden was upheld as a cautionary case worldwide as it tumbled down the league table in the same period.
However, the fact that alterations in test administration across rounds risk contaminating the results has received little attention. If such alterations affect performance, it may be nigh impossible to make valid comparisons with previous rounds since one is no longer comparing "like for like".
Perhaps the most substantial change to the Pisa test administration was the recent move from paper-based to computer-based assessment in 2015. In previous rounds, all pupils completed the tests using paper and pen. In Pisa 2015, pupils in most countries instead sat the test on a computer.
Following the release of Pisa 2015 scores, conspicuous drops in countries’ performances led to speculation whether the shift had affected the headline results. For example, Hong Kong plunged by 32 points in science between 2012 and 2015, while South Korea’s mathematics scores decreased by 30 points and Japan’s reading scores by 22 points. The fall was not restricted to East Asia: for instance, science scores in Germany, Ireland, and Poland fell by 15-24 points as well.
These are huge declines equivalent to between 50 and 100 per cent of one school years’ worth of learning – and it is simply not plausible that they reflect bona fide changes in pupil knowledge. Digging deeper into this issue, I found a striking positive relationship between countries’ average ICT usage in mathematics lessons and changes in mathematics performance between Pisa 2012 and 2015. It appeared as if pupils with more ICT familiarity had benefited, and pupils with less ICT familiarity had lost out, from the change to computer-based assessment.
Yet correlation is not causation. There could have been other reasons behind the pattern and I, therefore, refrained from publicising it. Today, however, the Centre for Education Economics publishes the first paper ever to show beyond reasonable doubt that the move to computer-based assessment did indeed affect Pisa score comparability.
The paper – authored by Professor John Jerrim – makes use of the Pisa field trial in which pupils were randomly assigned to complete the same questions on a computer or using paper and pen. We can, therefore, be sure that any differences reflect the causal effect of computers alone.
Analysing data from Ireland, Germany, and Sweden, the paper shows that pupils completing the computer-based test performed considerably worse than pupils completing the paper-based test. The differences are most stark in Germany (up to 26 Pisa points), followed by Ireland (up to 18 Pisa points) and Sweden (up to 15 Pisa points). Interestingly, however, there is little evidence of systematic gender differences in the impact of computers.
Release field-trial data
Importantly, the method officially used to account for differences between computer- and paper-based assessments in Pisa 2015 does not iron out the differences, although there is heterogeneity in this respect: while pupils in Germany and Ireland still perform 19 and 11 points lower in science respectively when applying this method, no statistically significant effects remain in Sweden.
Since we cannot extrapolate the findings to other nations, I urge all countries to release their field-trial data to researchers as soon as possible, as this would allow us to get a more complete picture of how the change has affected Pisa scores worldwide.
Still, the paper’s findings indicate that the shift to computers by itself has affected some countries more than others in the latest Pisa round. Policymakers should, therefore, be careful in drawing conclusions regarding relative system performance from recent changes in scores. The risk is simply too great that any such conclusions will be inaccurate.
Gabriel Heller Sahlgren is research director at the Centre for Education Economics and affiliated research fellow at the Research Institute of Industrial Economics