THE first steps towards writing this summer's English tests were taken as long ago as 1998 when test developers began a trawl of libraries, classrooms and bookshops - searching for stories, poems or articles on which the tests could be based.
They look for a theme which might capture pupils' imagination - for example 1999's KS2 English reading test featured passages about spiders. During the first year, questions are trialled in a sample of schools as part of the first pre-test of the final 2000 papers. After further work, the final test is prepared a full year before it is needed - allowing time for a second, large-scale pre-test. The KS2 tests - taken for real by 600,000 11-year-olds last week - were trialled in April 1999 by around 1,300 pupils.
The contents of the tests and mark schemes are then finalised and formally handed over to QCA for publication.
But that's the easy bit. The tricky part is ensuring that the passmark is set at the same standard every year. This is done using a combination of four different methods( see box below), as test developers decided that no single method could ever be reliable enough to withstand public scrutiny. Two methods use empirical statistics but the others are more controversial as they rely on the professional judgments of teachers and officials - including the Angoff method, devised in the United States as a way of assessing what makes a good doctor.
urprisingly, there is no formula for combining the different methods. Instead, the QCA, senior markers and test developers get together to thrash out which evidence should be given the most weight.
The participants will also know how real pupils performed in the actual tests, based on a sample of 30,000 scripts.
They also do a "reality check" - discussing what their decision would mean for Mr Blunkett's targets. This policy was singled out for criticism by the Rose inquiry team who said they were concerned that officials had even discussed whether the proportion they were proposing to pass would "seem plausible".
The test developers themselves acknowledge that all four methods have drawbacks: the anchor test has not been used at the same time each year and no longer resembles the real tests as well as it once did; the Angoff procedure relies exclusively on teachers' knowledge of the typical pupil but does not give them access to real scripts; while the senior markers' judgments may include unintentional bias.
However, the issue of standards is not solely concerned with procedures - the public must have confidence in the process or the concept of standards and the tests is undermined.
The QCA officials responsible for the tests would like teachers and parents to be more aware of how national test standards are set - particularly how and why professional judgments are so important.