METHOD 1 uses the results of the second pre-test to ensure that standards remain the same year-on-year. The 1,300 pupils who sat the second pre-test in April 1999 also took their "real" tests in May 1999. Their marks on the two tests are compared to check that the standard of the tests is the same and to help set the draft mark thresholds for each of the national curriculum levels.
METHOD 2 also uses statistics to equate standards year-on-year but uses an anchor test - one that is identical year-on-year.
METHOD 3, which uses professional judgments, is a version of the procedure, developed by William Angoff in the 1970s to assess doctors. Colleagues are asked to picture an imaginary doctor with the characteristics needed by a good doctor and then measure the real doctor against those standards
Each year a dozen teachers spend two days at the QCA where they study that year's reading test and its mark scheme. They are asked to imagine a typical borderline student and go through the test, question by question, predicting how that student would score. This is repeated by every teacher for each level boundary. These judgments are aggregated by officials after the meeting and a mean score calculated for each level boundary.
METHOD 4 involves "expert" judges and takes place after the test has been sat - this year's script-scrutiny meeting takes place early next month. Senior markers examine a sample of scripts to check how far the assumptions on which the draft level thresholds were based (using the first three methods) are confirmed by pupils' performance in the actual test.