The scottish Survey of Achievement, introduced in 2005 as a superior alternative to "unreliable" 5-14 data, could also be unsound, claims an influential research group.
Analysis by the National Foundation for Educational Research suggests that the random sampling approach of the SSA throws up a whole new array of problems.
So concerned is the NFER that it has warned Ed Balls, the schools secretary in England, not to replace the English testing system of SATs with a sampling system too akin to the Scottish survey. It is, it says, very costly and contains disparities between teachers' assessments and pupils' SSA test results.
There have been "issues with non-participation" in the SSA and proposals have been mooted to give more feedback to participating schools to "incentivise" them to take part, the NFER reveals.
An English version, it suggests, should therefore make it compulsory for schools to participate, rather than voluntary as in Scotland.
The research body made its comments in a submission to Mr Balls's expert group, tasked with finding a national monitoring system to replace the much- criticised SATs tests in England. It found that some pupils taking part in the SSA had done less well than expected, probably because of the "low stakes" nature of the test which contributed to a lack of motivation. Nevertheless, it argued that this was a price worth paying in England, if it meant that teachers would no longer have to teach to the test.
The researchers also report that disparities have been found between teachers' assessment and the pupils' SSA test results: "The levels are set using primarily professional judgment, which may call the reliability of the results into question."
The NFER's findings mark the first time such concerns have been aired publicly, although some authorities, notably Glasgow, have argued that the SSA does not provide sufficient or useful information at local authority or national level.
Maureen McKenna, Glasgow's head of education services, told The TESS last year (September 6) she felt the SSA model lacked accountability. That, coupled with the need to replace the outdated 5-14 tests, had prompted the council to investigate means of setting up its own diagnostic testing system for individual pupils.
The SSA replaced the Assessment of Achievement Programme (AAP), which had been running since 1983. Tests are held in P3, P5, P7 and S2, using a sample of 40,000 children 5 per cent of pupils.
Subjects assessed on the 5-14 levels are English, maths, science and social science. Since the introduction of the SSA, teacher assessments have been collected for the same sample of pupils as sit the tests. They are made up of written tests, classroom investigations (overseen by a team of field officers), practicals (again overseen by field officers), extended writing (marked by teachers and moderated externally), teacher assessment (collected in advance of the tests) and pupil and teacher questionnaires.
The process, which is supported by a team of field officers who are nominated by local authorities, is described by the NFER as "highly valid" but nonetheless a "high-cost system".
A spokeswoman for the Scottish Government said it was reviewing the SSA to ensure it reflected A Curriculum for Excellence, particularly around literacy and numeracy. "But the fundamental model of low stakes testing of a sample of pupils using a range of types of assessment and material drawn from across the range of the subject will continue," she said.
The Government also defends the costs attached, saying the approach offered good value for money based on the range of evidence it provided. "The sample approach is considerably cheaper than delivering equivalent national assessments to all pupils," she added. "The SSA assessments are different from teachers' judgments and it is not unexpected for the results to be different. Teacher judgments are based on a range of evidence gathered by the teacher over a period of time while the SSA assessments are snapshots at a point in time. Pupils will not always perform in the same way in a one-off test as they have done in class. By gathering both, we gain a deeper understanding of attainment."
The NFER report also examined the Assessment of Performance Unit national monitoring system used in England in the 1970s and 1980s before national testing was introduced, and the international comparative studies Timss and PISA.
Potential problems uncovered by the researchers included increased teacher workload; controversy over the way findings were analysed; difficulties in keeping the measures constant, relevant and reliable over long periods of time; misinterpretation of results by the media and public; low participation rates among some groups of pupils; high costs; and goalposts being moved once monitoring had already begun.