Arguments that exam standards are slipping can just as easily be turned on their head. John F Bell argues it is unwise to make simplistic comparisons
On June 7, 1951, some nervous sixth-formers completed the last paper of the first A-level maths exam produced by the University of Cambridge Local Examinations Syndicate .
Little did these guinea pigs, the products of world-renowned public schools and academically rigorous grammar schools, suspect that they were taking the examination that was to become the "jewel in the crown" of the English education system and that they were setting a gold standard that would survive for at least 50 years.
The examination contained questions on statistics (see illustration). The candidates were asked to calculate the correlation and comment on the result. To receive a pass mark on the paper candidates would have to obtain good answers on just three other questions on the paper.
Forty-two years later, candidates, many of them from under-funded comprehensives and having received "trendy" modern teaching methods, entered for the Midland Examining Group's GCSE examination in statistics. The candidates expected to get grades A-D encountered the question (see illustration), on the higher-tier paper. They were expected to plot the data, calculate the regression line, analyse the residuals, and comment on them. The GCSE question was one of 15 compulsory questions.
Clearly the GCSE candidates were expected to do more in less time and on a smaller part of the paper. Does this mean that GCSE statistics in 1993 is equivalent to A-level maths? Had 1993 candidates reached the "gold standard" before starting A-level? Was the 1993 A-level a "platinum standard"? No sensible person would answer yes to these questions. It is patently unreasonable.
The exams are not equivalent for several reasons. The GCSE candidates were expected to use an electronic calculator in this exam. Obviously in 1951, candidates would have had to calculate the correlation coefficient by hand and to get full marks on the paper would have had to do it in less than 18 minutes.
Statistics was still a relatively new subject in 1951. Older maths teachers in those days would have completed their formal maths education before many common statistical techniques had come into general usage.
There is also the question of the representativeness of the questions. The other A-level questions would still be considered very demanding. The range of maths covered was broad. The GCSE candidates followed a course that only covered statistics. It is doubtful that there are many GCSE candidates who would, for example, be able to "Use MacLaurin's series to expand tan x in ascending powers of x as far as the term in x". This question is only part of one 1951 A-level question.
If it is easy to reject the idea that finding similarities between old A-level questions and recent GCSE questions is evidence of an improvement in standards, what about the reverse situation? What should be concluded when a current A-level question is found on an old GCSE or equivalent exams?
The press would argue that this is evidence of a failing examination system. However, the arguments about the structure of the examination, the breadth of the syllabus, the time spent on teaching the content of questions still applies.
The strategy of comparing individual questions is far too simplistic. The comparison of standards of two exams requires consideration of the whole examination, the mark scheme and the performance of the candidates on the questions.
This is a demanding task that relies on the skill of highly experienced examiners. Last year the UCLES carried out a study of A-level standards in maths which involved nine highly competent examiners from other boards.
They spent two days comparing 1986 scripts with 1995 scripts and were asked to make comparisons of pairs of scripts. For each comparison they were required to nominate one of the pair as the better. The judges were required to make a forced choice and not allowed to have ties.
With this methodology, a judge is asked to compare the quality of one script with another and not to an internalised mental standard for a grade. This methodology has two advantages. Firstly, concrete comparisons between two scripts are made, removing the uncertainties associated with a notional standard. Secondly, differences between notional standards of judges cancel out and the methodology, therefore, controls for variability in judges' internal standards.
The results of their comparisons indicated that over the lifetime of one syllabus there had been no decline in standards and in comparison with its immediate predecessor a very slight decline.
The issue of standards over time is complex and requires careful consideration of many issues. Despite their appeal, simplistic comparisons are clearly inappropriate. Serious research in this area is complex and requires skill in interpreting the available evidence.
The author is a research officer at the UCLES, but writes in a personal capacity. Further details of this research can be found in the paper Standards in A-level Maths 1986-1996 which can be found on the education online site on the internet at Http:www.leeds.ac.uk educoldocuments000000333.htm or in volume 8, No. 2, of the British Journal of Curriculum and Assessment, or directly from the author .