At the end of a six-year research project, the team from the National Foundation for Educational Research are in a position to reflect on these developments from a unique perspective. We have been involved since the original trials of SATs through their evolution into "national curriculum tests".
The history of the project is the story of an attempt to do different things at the same time, some of them incompatible, and we were constantly caught up in these tensions. One of them came up again and again, and many of the changes over the years can be traced back to it. It is the tension between authenticity and manageability.
Authenticity is the important matter of how well the tests or assessment tasks reflect the curriculum they are assessing. If they cover only a limited part of the curriculum, the information they give is correspondingly limited.
The working groups that drew up the national curriculum set out to encapsulate the essential points of good practice in their subjects. In all three core subjects, English, maths and science, these are broad and complex and include not just knowledge but skills, understandings and processes.
So the aim of creating assessments to reflect the whole of this curriculum was ambitious, but this was the job given to the research team in the earliest years of the project. We took it on with what now seems nave optimism. The 1991 tasks for six to seven-year-olds included the "floating and sinking" investigation in science, the "making a game" activity in mathematics, and the individual reading conference in English. They were attempts to mirror the way the full curriculum was taught in the classroom.
The outcome is well known. However authentic they might have been, the tasks took up too much time and placed too heavy a workload on teachers to be acceptable over the half-term period allowed. In the view of the teaching profession, they were unmanageable.
The following years saw a succession of attempts to make them more manageable. Tasks to assess the practical skills and processes of science and maths were dropped. The number of attainment targets covered was successively reduced. But this improved manageability was accompanied by a reduction in authenticity. The assessments covered less and less of the curriculum and became less and less like it.
Testing of different parts of the curriculum developed in very different ways under the pressure towards manageability. In science, there were no compulsory tasks in scientific investigation after 1991. However, the 1991 SATs had raised the profile of this aspect of science, and helped some teachers to understand the nature of investigations better. While in one sense its removal was disappointing, it gave us the chance to develop optional materials for teachers to use in their own assessments. This was a more authentic reflection of the nature of scientific investigation, and proved one of the most rewarding and successful aspects of the work.
Mathematics assessment took a different course. From covering a broad range of subject matter in 1991, it narrowed progressively until only "number" was included in 1994. Then the coverage broadened again, but within a written test rather than a more flexible task.
In reading, the tension between the "authentic" reading conference - with teachers listening to children read and analysing their mistakes - and the "manageable" written test was strongest. The present version represents a stand-off between these two demands, with both approaches available and attracting support.
Over the years, the shorter, more manageable tasks and tests have ceased to be authentic in the sense that we were striving for at the outset. They no longer attempt to cover the breadth of the curriculum subjects, to reflect normal classroom activities, or to assess all the understandings, practical skills and processes. In the later years of the project, our task was to capture as much as possible of these understandings and processes in carefully designed questions for written tests. This requires skill and ingenuity, but is less innovative and exciting than trying to sample an authentic classroom experience and assess it.
Since Sir Ron Dearing's review in 1994, it has been clear that full authenticity is to be the province of teacher assessment. In their day-to-day assessments, teachers have time to look at practical skills, observe processes and record what children do in a range of contexts. They can achieve far more than could ever be done in a single assessment task. Correspondingly, to find out how a child is doing in the whole national curriculum, the teacher's range of authentic assessments must be included alongside the formal tests, and that is the system we have now.
Reflecting on the six years in which so many changes have taken place, the overwhelming impression is not of absolute success or failure, but of the need to balance different considerations, all of them important. The assessment system must not lead to a narrowing of the curriculum; but nor must it impose impossible workloads on teachers. We have learned a great deal about the different ways in which this balance can be struck.
Underlying all this, however, is a fundamental issue about the importance of assessment to education. We are trying to find out what very young children understand about a very broad and complex curriculum. No one should ever have expected this to be an easy matter.
Dr Marian Sainsbury is a senior research officer at the NFER. SATs - the Inside Story is available from NFER Publications. Tel: 01753 574123. The project was conducted under contract to the School Curriculum and Assessment Authority and the School Examinations and Assessment Council