Inevitably, and in the face of the global pandemic, this week has seen the cancellation of the 2021 examinations for Higher and Advanced Higher qualifications, following the October decision to cancel the diet for National 5. Perhaps also inevitable has been the strong reaction to this decision in various quarters. For some, the cancellation of exams is seen as a dilution of academic standards. Some have even gone as far as to suggest that no exam means no qualification. This latter assertion is, of course, nonsense; it conflates the qualification itself with the method of assessment.
The issue of whether exams represent the most rigorous method of assessment is more credible, albeit contested. Views range from seeing exams as the gold standard of assessment to the position that exams mainly test how well the candidates are coached to sit them. Proponents of the exam system argue that the exams are an equitable assessment tool, testing all students under exactly the same conditions on the same questions; exams assess how well students work under pressure and without any outside help.
Conversely, critics of the exam system argue that they are neither a valid nor a reliable assessment tool: an examination, conducted on a certain day and on a specific chunk of knowledge, does not easily assess a broad range knowledge, competences and skills; and there are many extraneous circumstances that might interfere and affect performance in a test on a given day, with possible lifelong consequences for the examinees.
Priestley review of 2020 SQA results fiasco: 17 key findings
SQA exams: Influential group of heads questions whether there should be any exams at all in the long term
We suggest that an exam-versus-coursework dichotomy is unhelpful in any case, and misses the essential point that different forms of assessment test different things. Unlike terminal assessment, continuous assessment, such as coursework, reflects learning in the context of the classroom and requires students to demonstrate a consistent effort over a long period of time. There is evidence that different assessment types do not have similar reliability across different groups of learners. For example, girls tend to perform better in coursework, while boys tend to perform better at standardised tests. There is also evidence that the results of the standardised tests do not reflect academic abilities and are poorer predictors of graduate grades for ethnic minority students. Overall, some research (eg, of different types of assessments in higher education) concludes that continuous assessment gives practically the same pass/fail rates as a final exam.
Coronavirus: Rethinking SQA exams
A second set of criticisms – with which we have a great deal of sympathy – of the decision to cancel exams is reflected in the observation by Professor Lindsay Paterson, of the University of Edinburgh, that exams are being replaced by exams. This has been made inevitable to some extent by the guidance from the Scottish Qualifications Authority (SQA), which has sought to reduce teacher workload and account for missed study time by removing coursework – as we pointed out in our Rapid Review of National Qualifications Experience 2020, this has the effect of reducing the evidence base for continuous assessment, making it more likely that schools will fall back on formal tests.
Subsequent guidance for National 5 seems to reinforce this tendency. We believe that the narrowing of assessment to formal testing under controlled conditions shows a lack of imagination and limited understandings across the system as to what constitutes an assessment. Exploring new approaches offers good opportunities to broaden understanding and develop assessment literacy. We emphasise here that when we recommended the development of validated assessments for National 5, in the rapid review, we most certainly did not have in mind a series of externally set but internally assessed pencil and paper tests.
The emerging practices based around formal tests will have detrimental effects: on the workload of the teachers who will end up assessing these additional tests, when study leave no longer exists to free up time; and on the students who will be tested to within an inch of their lives. Moreover, there are serious concerns about the validity of these tests; as SQA points out, such tests have good predictive ability for exam performance, but one can seriously question their validity for assessing student achievement holistically.
So, what do we advocate? It is first necessary to point to several dangers inherent in any system of assessment. First, local variation across different local authorities seems to be a stumbling block in the development of any system of moderation for a national qualification. The key principle to be adopted here should in our view be "developed nationally, but applied locally".
Second, workload should be a key consideration; the potential for a system to become over-complex and burdensome is considerable. Our interactions with teachers around Scotland suggest that many existing moderation practices are very bureaucratic and time-consuming, and moreover in areas that are arguably less urgent in the current context, eg, assessment against Broad General Education (BGE) Curriculum for Excellence (CfE) levels. Consideration should be given to scaling these back to make space and time for senior phase assessment and moderation. We, therefore, suggest that moderation should be underpinned by clear principles of transparency and proportionality.
Third, due attention also needs to be given to the equity, equality and children’s rights implications of the system. Development should be underpinned from the outset by clear impact assessments for equalities and children’s rights, and we suggest engagement at an early stage of development with the Children and Young People's Commissioner Scotland (CYPCS) and the Equality and Human Rights Commission (EHRC).
Finally, isolation of teachers in small schools or departments can have an impact on the support available for assessment, and may preclude effective moderation. Consideration should be given to creating communities of assessors across and between schools.
We believe that the system can learn from existing practices in universities and further education colleges. There are also historical precedents of proportionate and effective moderation systems from elsewhere in the UK (eg, GCSE and GNVQ – General National Vocational Qualifications – in England and Wales). Typically, rigorous systems include the components listed below. We note again that inter-school working is beneficial, to ensure that teams of markers have a critical mass for peer support.
- The development of validated assessments These do not have to be produced externally; indeed, the process of developing assessments is a useful one for those subsequently making assessment judgements. A typical process would be for assessments to be designed locally, and validated by external verifiers, ensuring both local relevance and a degree of standardisation.
- Sense making Typically, this would involve small teams of markers (prior to the commencement of the assessment process) assessing a small number of items, discussing their judgements and agreeing a standard.
- Support The use of "marking buddies" is useful, especially for new markers, or those who are isolated in small departments. Problematic assessment decisions can be referred for a second opinion.
- Internal verification Typically, this involves a single person with appropriate subject expertise (from the school or a partner school) cross-marking a small sample of assessments from each marker in a particular subject. Where issues are identified, a closer look can be taken (ie, more sampling) and cohort grades adjusted if necessary. Internal verification should then be signed off by an examinations officer or other designated person (eg, headteacher) for each centre. We emphasise here that this is not a process of evidencing each and every assessment decision; instead, it is a system based on trust that the majority of assessors do a good job, and therefore its purpose is to undertake light-touch sampling across a full range of grades, to identify problems and affirm good practice.
- External verification Ideally, verifiers should be practising teachers or lecturers with a current role in the subject in question, engendering a sense that this is a peer-led system rather than a top-down. This could be established in one of two ways: as a peer-led system where schools appoint externals for their subjects from other schools (analogous to the university external examinations system); or through central recruitment (eg, by local authorities or SQA) of a cohort of experienced markers, each serving a set number of schools in their subject. External verifiers should have access to the whole set of assessments, from which they choose the sample. We have found in our university external moderation experience that institutions often try to choose which scripts are moderated, but we believe that this provides an insufficient degree of oversight and leaves open the possibility of abuses.
- Checking This might involve statistical analysis of national patterns of attainment against historical trends, to identify anomalies for further checking. This should not be a purely statistical exercise, as has been the case in 2020, but should involve qualitative investigation of anomalous patterns at a centre and subject level, which may require adjustment. In the case of inconsistency at a subject level between historical patterns and the current grade distribution, it may be necessary to examine the grade distribution in detail (ie, separately As, Bs, Cs, Ds) and to identify the source of the inconsistencies (eg, there might be too generous marking leading to too many As and/or too harsh marking resulted in too many fails, but the middle range might be absolutely fine, etc). Patterns of inconsistencies should be identified and dealt with, but this should be justified (eg, learning outcomes for A weren’t achieved, so this should be B; not the distribution is different to historical averages, so X number of As will be changed to Bs). This sort of analysis will provide useful information about centres, where there are declining and/or improving trajectories in attainment.
The pandemic has exposed weaknesses in Scotland’s current system of awarding qualifications – particularly an overreliance on exams and formal testing, and the fragmented ladder of qualifications approach. The forthcoming OECD (Organisation for Economic Cooperation and Development) review provides a further opportunity to rethink the way we conduct high-stakes assessment.
As International Council of Education Advisers member Andy Hargreaves stated this week on Twitter: "Scotland cancels final exams for 2021. An opportunity to rethink assessment strategy in perpetuity, perhaps?" Let’s grasp it with both hands.
Professor Mark Priestley and Dr Marina Shapira are lecturers and researchers at the University of Stirling. They both worked on the review of the 2020 SQA results, which Professor Priestley led. This is a version of a post that originally appeared on his blog