John Gray explains why we need a multi-level approach to evaluate exam results. British researchers have known why they needed a multi-level approach to value added research for over a decade. Now, thanks to the breakthroughs achieved by statisticians such as Harvey Goldstein of London University, problems that were defeating the largest computers can be solved on desktop machines.
Take a very small local education authority with 600 pupils in five schools. The schools know the GCSE performances of individual pupils in Year 11 and they have measures of their prior attainment at intake.
The LEA has three questions. What is the relationship between GCSE and pupils' prior attainment? Are some schools more or less "effective" than others? Do some do better with one kind of pupil than another?
Step 1 involves finding the overall relationship between prior attainment and performance at GCSE. Each dot on the first graph, right, represents one pupil. The slanting black line (the "line of best fit") describes the relationship between the two measures. The better a pupil was performing at intake the better they did at GCSE.
Step 2 establishes the "line" for each school in turn. There are 150 pupils in School A; the line for these pupils is drawn on to the second graph. It is identical to the overall line in the first graph. In School B the line for their pupils is a little higher at every point than School A's. At every level of prior attainment pupils at School B do a little better than at School A; at School C a little worse.
The lines for Schools D and E are drawn on the third and fourth graphs to make their positions clearer. They too could have been drawn on the second graph.
With School D it appears that initially low attainers do rather better than similar pupils in the average school (A); conversely, higher attainers appear to do somewhat worse. In School E it is the other way round; higher-attainers do better while lower-attainers do worse.
Most researchers know how to take these two steps. But can we really be sure that the lines for schools B and C are significantly different from that of School A? And are there any statistical reasons, such as the smaller numbers of pupils on which their lines are based, which could account for the different slopes of the lines in Schools D and E?
Statisticians have researched most of the factors which determine the way the lines are fitted. The multi-level programme takes many of these into account. In Step 3 the line for each school is compared with the overall line, bearing in mind all the reasons it knows could affect its position. It then decides whether to alter it; sometimes it does.
With the lines for Schools B and C it would try to decide whether there really was a statistically significant difference between them and the average (represented by School A).
With School D it might decide that, taking everything into account, the line should be just a little steeper; with School E that it was just a bit too steep. This process of comparing the results of individual schools ("iterating") continues until it has reached the "best" solution.
The multi-level approach tells us how well a school is doing relative to the average and other schools. People sometimes argue that simpler techniques yield similar results. Sometimes they do. The trouble is you can't tell until you have compared the approaches. This means that one ends up doing multi-level analysis anyway. Given what is at stake, schools deserve the very best evaluations we can provide.
John Gray is director of research at Homerton College, Cambridge.