Home
Archived
How shall I compare thee...?

Back

How shall I compare thee...?

There are growing calls for writing assessment in primary to be replaced with comparative judgement, a method that promises not only more accurate judgements of writing ability but also a reduction in teachers’ workload. Warwick Mansell asks if it really is the solution teachers are crying out for

7th April 2017, 1:00am

Warwick Mansell

Teachers and teaching assistants at Old Hill Primary in Sandwell, West Midlands, were gathered together, talking about what constituted good writing. Usually, discussions about writing would focus on the technicalities of language and revolve around close scrutiny of a list of specifications, but on that day they centred on a much more holistic, perhaps more natural, appreciation of whether a child could be considered a “good writer”.

It was, says interim headteacher Craig Westby, a refreshing contrast to the current grind of teacher assessment in primary, which, for him and many others in the profession, has often come to resemble a soul-destroying, creativity-stifling list of boxes to tick.

“The assessments we were making were about enjoying children’s writing: not having to look for needles in haystacks in terms of whether a child was using hyphens, say, but just discussing whether they wrote well. And, I know it sounds trite, but discussing this has brought us closer together as a staff,” he says.

Welcome to the world of comparative judgement (CJ) - an assessment system that promises radical changes to the way that pupils’ work is evaluated, particularly in more open-ended subjects such as writing.

Having quietly been investigated behind the scenes among academics, on and off, for nearly a century, it is now being talked about as a possible mainstream feature in schools throughout England in the coming years.

Indeed, in recent months it has not only been trialled at a number of schools, including Old Hill, but the technique has also gained some influential champions. Leading academic Dr Becky Allen, of Education Datalab, told MPs in January that the case for using it in assessing children’s writing was potentially “compelling”. Meanwhile, Nick Gibb, the schools minister, has said that it constitutes a “potential future” for one of the more onerous aspects of teachers’ professional lives. And last week it was even mentioned in a Department for Education consultation on the future of assessment.

The assessments we were making were about enjoying children’s writing: not having to look for needles in haystacks

Among its claimed benefits are improvements in teaching quality and even that holy grail for hard-pressed professionals: a drastic reduction in marking workload.

So what exactly is CJ, and can it really live up to the hype?

The origins of the technique date back to 1927, when the psychologist Louis Thurstone, of the University of Chicago, published two papers setting out a new way of measuring various qualities, such as whether a particular shade of grey was darker or lighter.

Individuals could compare two items, he suggested, and decide whether each was, for example, heavier or darker. They would then move on to another two, and then another two, until many objects could be put in rank order. From the start, Thurston could see a use for this technique in the educational field, in comparing the quality of pupils’ work.

Over subsequent decades, researchers investigated what became known as the “Thurston pairs” technique and its potential application in exam grading. But it is only now that the method is being more widely seen as an alternative to more conventional marking approaches in English schools, with potentially big implications for teachers.

Judging two scripts together

Under CJ, a teacher takes two pupils’ scripts in, say, writing. In its purest form, they then simply decide which one they think is better. This is then fed into computer software and the teacher compares another two scripts. With other teachers going through the same process, it very soon becomes possible for the algorithm to rank many scripts in order of quality, as judged by the teachers. In trials, the process has been significantly quicker than current marking and moderation of written work.

The scripts’ precise rankings need not be shared with anyone other than the computer, but they can be used to produce a categorisation of pupils’ work into various levels of quality, as decided on by the teachers using just that one, very basic, criterion: which piece of work did they think was better?

For teachers who are accustomed to the current writing assessment method in primary, such an approach may look suspiciously “light-touch”. For under the national curriculum’s “interim assessment frameworks”, teachers have been conditioned to take a tick-box approach to judging writing. They have been presented with a set of requirements, including grammar, spelling and punctuation, with pupils having to show that they have mastered them all to be deemed to have performed well in writing overall.

Such a prescriptive approach might appear more “accurate” than CJ’s more intuitive approach, but advocates of the latter claim that’s not the case.

One company proving very influential in the move to take CJ nearer to the mainstream of English education is No More Marking - a no-doubt seductive title, if ever there was one, for hard-pressed classroom professionals. The business was set up by Dr Chris Wheadon, a former senior pyschometrician at the exam board AQA. It runs a software system through which teachers’ CJs are processed, and was recently used by Daisy Christodoulou, head of assessment at Ark Schools, for trials in classrooms across the academy chain.

Christodoulou says CJ offers a much better alternative to the frameworks, with big implications not just for the way pupils are marked, but for how they are taught.

“The interim assessment frameworks offer precise information about the range of punctuation pupils must use, and the words they must be able to spell. A pupil, for example, must use cohesive devices such as fronted abverbials in their writing,” she explains. “However, what these frameworks can’t do is define the difference between pupils who use such devices well, and pupils who use them poorly, or even completely inappropriately.”

One of the advantages of CJ is that it allows teachers to make judgements that “work with the grain of the mind, not against it”

In a blog post on the subject, and also in a talk at the ResearchEd conference last year, Christodoulou gave an example of two pupils’ scripts - one of which she says most people would see as much more sophisticated in terms of its writing, but which would have scored less well in the frameworks than the piece judged less sophisticated by teachers.

She says that one of the advantages of CJ is that it allows teachers to make judgements that “work with the grain of the mind, not against it”.

“Instead of asking teachers to make absolute judgements against unhelpful rubrics, CJ requires teachers to, well, make comparative judgements instead,” she explains.

‘Freeing up teaching’

Christodoulou argues that moving to CJ will “free up teaching in quite a profound way”, as staff focus not on trying to conjure compositions that tick the boxes of the mark scheme, but on getting children to write vividly.

But can teachers really agree on a single dimension of what constitutes good writing? And is it possible to come up with one simple judgement on a piece of work, without recourse to any set of characteristics for which a teacher might be looking in a script?

Mary James, a former professor of education at the University of Cambridge and a former English teacher, believes it is. “There is evidence that teachers are very good at ranking: they can make judgements without recourse to tick-lists of criteria,” she says.

James Bowen, director of middle leaders’ union NAHT Edge, was originally more cynical: he had serious doubts as to whether it would be possible to reach judgements on quality without a detailed set of rules.

But having partaken in a trial day for No More Marking’s system, those concerns have largely disappeared.

“You find that very quickly you can make that judgement between two scripts, and there is a surprisingly high level of agreement between markers,” he explains.

Christodolou points to seemingly very high figures for “reliability”: markers will agree on which script is the better out of a pair almost nine out of 10 times, a statistic which apparently compares well with conventional marking to criteria or rubrics.

And there does seem to be a growing push from the profession for CJ to be adopted more extensively.

There does seem to be a growing push from the profession for CJ to be adopted more extensively

Most observers say this is driven by the widespread unpopularity of the interim frameworks, and the tantalising potential that CJ seems to hold - both as an assessment of children’s writing that tallies better with teachers’ ideals than a tick-box list of criteria, and as a tool that could help to cut teachers’ ever-growing workloads.

No More Marking worked with Ark to conduct a small pilot of CJ in five schools last summer - with encouraging results, says Christodoulou. It is now in the middle of a much larger trial involving around 250 schools across England.

Meanwhile, blogger and Tes columnist Michael Tidd, deputy head of Edgewood Primary School in Nottinghamshire, says: “If we really want good writing to be judged ‘in the round’, then we cannot rely on simplistic and narrow criteria. Rather we have to look at work more holistically - and CJ can achieve that.”

Cutting workload

Bowen says that primary school leaders can certainly see the advantages in terms of staff workload.

“The traditional method of writing moderation is a long process: you have a list of criteria, you make a judgement and then you get someone else in the school to look at it as well,” he explains. “With CJ, in the session I took part in they had 30 to 40 of us making judgements in quite a short space of time. That is very attractive.”

There is evidence that the government has been listening. Last week, its consultation on changes to primary assessment floated the possibility of reforms to the writing frameworks from as soon as September.

Comparative judgement was not put forward as an immediate replacement, but the paper says that the DfE will gather evidence and even trial CJ, which, it says, “may facilitate more rounded judgements and help to increase inter-school reliability”

However, CJ is not universally seen as the way forward. Dr Christian Bokhove, lecturer in mathematics education at the University of Southampton, has concerns.

“There is a desperation out there for an improvement on what has happened to writing assessment at key stage 2, but that rings alarm bells for me,” he explains. “I fear that CJ is being presented as a silver bullet that can solve all our problems. But these silver bullets do not exist.”

Joshua McGrane, a research fellow at the Oxford University Centre for Educational Assessment, who is conducting a review of primary writing assessment for Oxford University Press, adds that this is a field that is relatively under-researched.

“The greatest strength of CJ is its use of intuitive judgements by teachers,” he reveals. “This is supported by numerous research studies, which indicate that judges are typically highly reliable in making these judgements.

“However, to date, CJ has been a fairly niche research area in education…As a result, minimal research has been carried out to validate the method, including better understanding the decision-making processes of judges...The scarcity of such research has not always been reflected in the large claims that have been made about the CJ approach in England.”

The greatest strength of CJ is its use of intuitive judgements by teachers

Meanwhile, an exam board source says that, while the technique has “many admirers” within the boards, it was not being considered for use in GCSEs and A levels - and could also face serious challenges in terms of widespread use in primary.

Teachers would need training in forming a common understanding of what they are looking for in a good piece of writing, he says, which might be at odds with CJ’s apparent core reliance on teacher intuition.

Another concern is whether the method would withstand the inevitable pressures that would be placed upon it were it ever to form part of England’s high-stakes accountability regime; for example, if results were used to inform schools’ published data for KS2 writing.

At a recent parliamentary hearing, Dr Robert Coe, professor of education at Durham University, said: “The big question about CJ is how it operates in a high-stakes environment. We have seen it work in experimental situations [and] it is quite an exciting prospect. My worry is that we would introduce it as a solve-all…and find that some of those same problems are there because it is the high stakes rather than the assessment that drives the problems.”

Trials ‘a great success’

Westby understands these reservations but thinks that, on balance, CJ is a better system than the one currently used. His school trialled CJ with 180 pupils in Years 1 to 6 last November and he says that it has been a great success. However, the issue of using the system for school accountability still looms large.

He says the school is using the method as a way of judging pupils’ writing for termly progress checks, rather than, initially, for end-of-key-stage data. He argues that not just the interim assessment framework but the new national curriculum as a whole is at risk of overemphasising spelling, punctuation and grammar. CJ is a way of counterbalancing this, though the school would not be neglecting the technical aspects.

“I think not enough credence is given to attributes such as writing flow and creativity, with [the curriculum] very much focused on other aspects,” Westby says. “If you follow that, as you judge writing all the way through the school, you have got a distorted view of how good at it the children were. We don’t think that is fit for purpose, so we looked at CJ and we have felt it gives us a more balanced view. I think it’s the best way of judging writing at the moment.”

Perhaps surprisingly, Westby adds that the biggest gain has not been directly related to assessment itself, but rather to professional development, as all classroom staff have taken part.

“We did not just use teachers to do the judging, but teaching assistants, too,” he reveals. “Everyone is using children’s writing for their professional development. There is not normally much opportunity for a Year 1 teaching assistant to look at a Year 6’s writing scripts, but there is now - and that can only be a good thing.”

However, he does admit that using the system in a high-stakes context might leave CJ with the same problem facing any form of teacher assessment: how to ensure that teachers, whose schools’ results would hinge on pupils’ marks, do not provide too much help to pupils by telling them what to write.

It doesn’t matter what assessment tools you use, you will still not get an honest picture

“We all know that not every school is completely honest in terms of the amount of support they give [to children in teacher assessment],” he says. “Until that changes, it does not matter what assessment tools you use, you will still not get an honest picture.”

Bowen agrees that caveats do exist about whether the system is ready for a widespread take-up. And so, like many who have looked at CJ, he recommends a relatively cautious welcome. Yes, the benefits could be substantial, but we simply don’t yet know how real the benefits might prove to be.

“A lot of schools are happy to say, ‘We will trial CJ,’ and if, in, say, three to four years’ time, there is a view that it can be an alternative [to current end-of-key-stage assessment], then so be it,” he says. “[But] the worst thing to do now would be to say, ‘We have got an existing assessment system that doesn’t work well, so let’s just jump on this.’”

Warwick Mansell is a freelance education journalist and author of Education by Numbers: the tyranny of testing (Methuen 2007). He tweets @warwickmansell

How does comparative judgement work in practice?

Daisy Christodoulou, head of assessment at Ark Schools, explains the process:

“At Ark Schools, we’ve done a number of trials of comparative judgement, and have worked with Dr Chris Wheadon, of No More Marking - a company that provides schools with a free online CJ engine.

“In practice, the process works as follows: you take the set of scripts that have been written by your pupils and scan them into a digital format. Most photocopiers can scan a set of paper scripts and turn them into a series of PDF files.

“You then upload these PDFs to the CJ engine - in our case, the No More Marking website. You then put in the email addresses of your judges. They will then be emailed a link; when they click on it, they will see a pair of essays on their screen.

“They have to decide which essay is the better piece of writing by clicking on it. Once they’ve made that decision, it takes them to the next pair of essays, where they make another judgement. And so on. The pairs of essays are selected at random. Each judge has to do a number of judgements, and the exact number depends on how many essays you want to judge, and how many judges you have.

“Once all the judges have finished, the algorithm that sits behind the system crunches all the decisions, and gives each essay a score.

“What kind of score can be awarded? To a certain extent, that depends on how you set up the task. If you want to award pupils a GCSE grade, or a primary grade like EXS or GDS, then you can do so in a couple of ways.

“One way is to use exemplar scripts that have been pre-judged as being at a certain standard. If you scan these scripts in as well as your pupils’ scripts, then you can use their results to help award grades to your pupils.

“Another way of awarding grades is to participate in a larger task. You still scan in and upload your scripts, but at the same time, lots of teachers from other schools do the same. You still make the same number of judgements as before, but this time, you are not just judging your own pupils: you are being given scripts from pupils in other schools.

“This way, you are getting results that are based on a national sample of pupils, not just your own.”

You need a Tes subscription to read this article

Subscribe now to read this article and get other subscriber-only content:

Unlimited access to all Tes magazine content
Exclusive subscriber-only stories
Award-winning email newsletters

Subscribe now

Already a subscriber? Log in

You need a subscription to read this article

Subscribe now to read this article and get other subscriber-only content, including:

Unlimited access to all Tes magazine content
Exclusive subscriber-only stories
Award-winning email newsletters