The conflict between inspectors' judgments in some "failing" schools and their test results, when analysed according to the new benchmarking arrangements, comes as no surprise (page 3). When you take a set of fallible inspections which rely upon the intuition of a mixed selection of contractors, and compare their findings with the results of imperfect tests administered under a variety of conditions, while at the same time oversimplifying the wide range of social contexts in which schools operate - what is surprising is that the discrepancies occur in only one failing school in six.
The Office for Standards in Education has for at least three years had the means of systematically analysing the likely effect of pupil intake on each school's results. Back in 1995 a state-of-the-art analysis by the London Institute of Education for OFSTED provided the means of equipping inspectors with the best available estimate of how each school's intake affected its performance.
Like any statistical device, this analysis is only as reliable as the information it is based on. But OFSTED certainly had a shot in its locker against schools which are merely coasting.
But in its locker is where the information stayed. Though the backroom girls at OFSTED were ready to roll out a comparison of like with like in April 1996, the chief inspector prohibited its use - lest pupil backgrounds be used as an excuse for poor results. So inspectors continue to guess at the effect of schools' intakes and some are clearly making unjust judgments in schools achieving wonders against the odds.
It was left to the Qualifications and Curriculum Agency to put test results into some sort of social context, in order to make school target setting more palatable. The resulting benchmarks, unhappily, fall more into the better-than-nothing category than state-of-the-art.
By making the assumption that all schools with 50 per cent of their pupils on free school meals are like each other, or, worse, comparable with those with 80 or 90 per cent, the agency's benchmarks run the risk of simplifying the influence of pupil background to absurdity.
The imprecision of tests and benchmarks alike means that we can't say whether OFSTED or QCA have got it wrong. But surely it is not too late for OFSTED to unveil its superior analysis, and use it to throw some light on how many inspections did get things right - or wrong.