The umpire strikes back;Platform;Opinion
The chief inspector of schools, Chris Woodhead, has asserted that the national curriculum tests are unreliable. This is demonstrably not the case.
He makes three specific criticisms: first, he doesn't think the tests are the right ones - he would prefer standardised tests in literacy and numeracy; second, the tests have changed so that like cannot be compared with like; and, third, they are being administered in a "creative" way by some schools.
These criticisms show a misunderstanding of the nature and issues of testing, and appear to have been made without either a basis of evidence or indeed any understanding of the concept of reliability.
Let us take these in turn. First, the charge that the tests are not the right ones. One has to ask if Mr Woodhead has looked at the tests. He wants "standardised tests" instead. Actually, the current national curriculum tests are standardised in terms of administration procedures, development processes and in the presentation of results. They are taken in formal sessions with teachers invigilating - that is, the administration of the test is standardised.
The questions and the tests have been through a year-long development process, being tried out on large groups of sample pupils. Feedback and statistical analyses are used to refine and standardise the questions and marking instructions. This is the development process of a standardised test.
Raw scores on the tests are presented in two ways - first, converted to national curriculum levels, so that they give useful information on what children can do and what their next steps should be - and second, for the primary-age pupils, converted to age-adjusted standardised scores. To all intents and purposes, these are standardised tests.
The whole relationship between reliability, validity and the time taken in testing is a balance. You can increase one but it alters the others. Mr Woodhead wants to make the tests more reliable. The tests are currently designed to reflect the national curriculum in a valid (achieving a proper representation of what is being measured) and reliable (thatis, consistent and accurate) way.
As with any assessment, the tests could be made more reliable. This could be done by having more questions and covering more ground - but testing would then take more time. The tests could be made more reliable by changing their nature, for example to multiple-choice. But this would produce tests which were less valid. For example, it is more valid to test writing by getting children to write than by asking them multiple-choice questions about writing.
Mr Woodhead needs to say what he is proposing. Does he want to make the tests take longer or does he want to make them less valid? It is easy to make global criticisms but less easy to face the balances which must be struck. As they stand, the tests are generally as reliable as published standardised tests.
Second, Mr Woodhead complains that the tests have changed so it is impossible to compare like with like. In this, he is, unusually, echoing academic critics who advise that standards cannot be measured over time, and that whether things are getting better or worse can never be known.
If this view is accepted, it is at least as true of GCSE and the A-level "gold standard". To say that difficult things are impossible is a counsel of despair.
In fact, the test developers involved in national curriculum assessment go to great lengths to ensure the consistency of criteria for the award of levels from one year to the next. During the development of the tests, a group of the same children do both the previous year's and current year's tests. The results can then be statistically equated. (This is the same technique widely used to maintain the consistency of the respected American standardised tests.) In addition, script-scrutiny exercises are undertaken during which expert judges must decide on the equivalence of the demand and the responses in the two tests. Using these methods does everything possible to ensure comparability of the tests year-on-year.
An extra consideration is that the tests have had to evolve to face changing demands. This can be for the best of reasons. Hence, mental arithmetic was introduced into the specification of the mathematics tests, after Mr Woodhead and others advocated its importance. When this happens, the same procedures described above are used to maintain comparability as best as is possible.
Again the question for Mr Woodhead is, what would he propose instead? The repeated use of a single test? This would age and become familiar in schools, and give no possibility of altering the curriculum or what is tested. Or not to use any tests at all?
Perhaps what is needed is a judgment made by OFSTED inspectors spending 10 minutes with each child in every school. That would require approximately 100,000 hours of expensive inspector contract time, and needs to be set against the relatively modest costs of the current system.
Finally, Mr Woodhead alleges that the tests are being administered in a "creative" way in schools. If true, this is a wholesale condemnation of the integrity of the teaching profession, and the chief inspector should be ensuring that these cheats are found and punished.
If, as is more probable, and almost admitted by Mr Woodhead in his interview on the BBC Today programme, he was basing this on sporadic anecdotal evidence, then the validity of his own statements is questionable. In either case, he should produce evidence for such a charge against a whole profession.
The chief inspector is rightly concerned about reliability, but this should extend to the evidence used for judgments. Not for the first time, he has made broad generalisations from little evidence.
A chief inspector who has become a true convert to the need for reliability would be presenting evidence that the inspection system is reliable and produces replicable and consistent results. Mr Woodhead has yet to do this for his own glass house. He should not be hurling stones at neighbouring glass houses without more careful collection of evidence and transparency in its evaluation. That is what making reliable judgments is all about.
Chris Whetton is assistant director of the National Foundation for Educational Research. He is head of the foundation's department of assessment and measurement which, under contract from Qualifications and Curriculum Authority, is responsible for the development of several national curriculum tests