Ofqual has launched a competition today for artificial intelligence experts to design a system to mark exams.
In a blog published today, the regulator said it would explore the potential of artificial intelligence (AI) and how it could be used to mark GCSE English language scripts.
Exclusive: AI may replace GCSEs in top private schools
The regulator said that "we want to understand whether there might be a role for AI in marking," noting that AI could be used to identify marking inconsistencies.
Ofqual said it wanted to explore whether AI could be "effective in spotting an erroneous mark from an otherwise good and consistent marker", and that mistakes were inevitable given that "marking can be a very demanding task".
To understand AI's potential, the regulator said it would use an "AI competition" to research how technology could be used to improve marking.
"Essentially, we will get several thousand student responses to an essay (a particular GCSE English language essay question), and these will be marked by human markers," Ofqual said.
These marks would be used by organisations with expertise in the field to input data using the best human judgement of exam scripts and therefore train AI systems.
"It is important that the training examples use the ‘best’ human judgement because AI systems are only as good as the data put into them. Therefore, in our study, we will use the most senior markers and each essay will be marked multiple times to ensure the marks do not reflect error," the regulator said.
Using a competition of numerous AI organisations using data from these scripts, different programs will be developed which can then be tested using a new set of exam papers.
"We can test these AI systems on another set of essays (for which we know the marks, but the AI systems do not). We very much hope this competition will help stimulate and identify the very best practice in this field," Beth Black wrote in a blog for the regulator.
"The results from this competition will help us undertake further subsequent research work – for example, modelling the impact of AI as a second marker or as a marker monitoring system."