Home
More doubts cast on reliability of Pisa scores

More doubts cast on reliability of Pisa scores

Uncertainty over how to switch to computer-based testing for Pisa 2015 affected scores and the ability to make comparisons over time or between countries

26th January 2018, 12:04am

Helen Ward

New research has cast further doubt on the reliability of influential Pisa results following a switch from paper to computer-based testing of pupils.

Field trials carried out by the Organisation for Economic Co-operation and Development (OECD), which runs the Programme for International Student Assessment (Pisa), in 2014, revealed students scored less well on average when given the same questions on screen rather than paper - with scores equivalent to about six months less of schooling.

The OECD devised a method of “adjusting” students’ scores when the 58 countries switched from paper to computer-based tests for the 2015 assessment, to account for this difference.

But the Centre for Education Economics (CfEE) has investigated this adjustment using data from the pupils in three countries involved in the field trials; it says that while this adjustment has been “beneficial”, it “does not overcome all the potential challenges of switching to computer-based tests”.

John Jerrim, a researcher at the University College London Institute of Education, looked at the data from more than 3,000 15-year-olds in Germany, Ireland and Sweden, who took part in the pilot study.

“Taking a test on computer is very different to the standard procedure of taking a test using paper and pencil,” said Professor Jerrim. “Yet the OECD has provided scant evidence on the impact this is likely to have had upon the Pisa 2015 results.

“Could this have driven some of the more surprising findings from the Pisa 2015 study, such as Scotland’s plummeting performance on reading and science compared with 2012; the significant drop in science performance in Ireland and Germany compared with 2012; or the significant decline in several East Asian countries’ mathematics scores? I certainly don’t think we can currently rule out such possibilities.”

The reason for pupils doing less well on computers was also unclear - whether it was to do with how pupils read on paper versus screen, differences in computer skills or test-taking strategies.

The latest findings follow last year’s admission from Andreas Schleicher, the official in charge of Pisa, that seemingly dramatic changes in performance for top-ranked countries shown by its “comparable data” could, in fact, be explained by the switch to computer-based tests.

“Further analysis is needed to establish the causes of decline in the share of top performers in some of the highest-performing countries,” he said in March.

Mr Schleicher said that although the study had ensured that, “on average”, pupils taking paper- and computer-based tests scored the same, that might not be true for some groups of high-performing pupils.

“It remains possible that a particular group of students - such as students scoring [high marks] in mathematics on paper in Korea and Hong Kong - found it more difficult than [students with the same marks] in the remaining countries to perform at the same level on the computer-delivered tasks,” he said.

“Such country-by-mode differences require further investigation to understand whether they reflect differences in computer familiarity, or different effort put into a paper test compared with a computer test.”

In today’s paper, A digital divide? Randomised evidence on the impact of computer-based assessment in Pisa, the CfEE says governments should “carefully reflect” on how comparable the results are to both other countries and their previous Pisa assessments.

CfEE founder and chair James Croft said: “It is vital that there is clarity around the methodology of these assessments, as governments clearly rely on them when setting education policy. We hope that by publishing this paper today, governments across the world will carefully reflect upon how comparable the 2015 results are both to other countries and to those from previous Pisa assessments.”

Pisa tests 15-year-olds in science, reading and maths and the results are closely scrutinised by governments, which has led to policy changes, such as the introduction of East Asian-style maths teaching in England.

The latest results from the influential international rankings were published in December 2016. They revealed that scores in the UK dropped in science, maths and reading compared with tests taken in 2012.

Despite the drop, changes in other countries’ scores meant the UK rose up the rankings in science to 15th place; it rose one place to 22nd in reading; and slipped from 26th to 27th place in maths. Fifty-eight countries took the tests on computer, while 14 kept paper tests.

Previous research from Professor Jerrim showed that ditching the pen-and-paper test for on-screen assessment widened the gap between boys’ and girls’ maths scores by the equivalent of two months’ educational progress in two-thirds of the countries and states that took part in the Pisa.

Yuri Belfali, head of early childhood and schools division at OECD, said that the effect of changes in the way the assessments were carried out had already been explained and discussed within the Pisa report. The research from the CfEE could not be used to draw conclusions about how different countries were affected, she added.

She said: “Overall, the main results of the [CfEE] paper -particularly those that correctly account for the non-equivalence of some test items across modes - are in line with what the OECD has published on this issue.

“The OECD has explained and discussed all the methodology changes made for Pisa 2015, including mode effects. However, we should read the paper of Professor John Jerrim and its conclusions with caution, since the Pisa field trial, on which he has based them, was not designed to support mode-effect analysis at the country level.

“In light of the large statistical uncertainty associated with country-specific results, and of the non-representative nature of Pisa field-trial samples, conclusions about the influence of the mode of assessment on individual countries’ trends should not be drawn from this research.”

Want to keep up with the latest education news and opinion? Follow Tes on Twitter and Instagram, and like Tes on Facebook.