Dylan Wiliam’s vision for fair and accurate assessment

As teachers, students and policymakers across the country question the viability of the current assessment system, world-respected assessment expert Dylan Wiliam talks to Tes about what he thinks the alternative should be
17th June 2022, 6:00am
Dylan Wiliam’s vision for fair and accurate assessment

Share

Dylan Wiliam’s vision for fair and accurate assessment

https://www.tes.com/magazine/teaching-learning/secondary/dylan-wiliam-exams-assessment-fair-accurate

The past two years have brought huge disruption to our assessment system, prompting more people to question how we do things. But how much has the pandemic actually taught us about how to make the system fairer and more effective?

According to Professor Dylan Wiliam, not a lot. Such a view from an international expert on the topic may seem pessimistic, but Wiliam, who is emeritus professor of educational assessment at UCL Institute of Education, sees the past two years as a missed opportunity.

Tes sat down with him to talk about this and other key tenets of assessment that he believes every teacher should know.

The pandemic has brought about big changes in assessment. Have we learned anything about what works and what doesn’t?  
The past two years have told us nothing about what a good system might look like.  

The problem was that we changed the standards [students were able to sit a scaled-back version of an exam paper], and this was a mistake. We wouldn’t want, for example, to change the driving test to be easier to pass. There’s a standard, it’s meaningful, and we should have kept it at that.

If we had used teacher-assessed grades but otherwise kept GCSE examinations exactly the way they were, while taking into account extenuating circumstances, we would have had something to compare to. Instead, we have nothing.

Some people have argued that the pandemic makes a case for scrapping exams in favour of teacher assessment. Where do you stand on that?
We know from a lot of studies from all over the world that high-stakes assessments improve student performance. Importantly, it’s not just the skills being tested that are improved; the presence of high-stakes assessments improves performance overall.

However, in every single case, the negatives outweigh the positives. This is because students and teachers focus on the things that are going to get tested, but tests never assess all the things that are important. 

For example, in the national curriculum for English, we have four domains: reading, writing, speaking and listening. In exams, reading gets more attention than writing and the other two don’t get anything at all. 

So, what’s the teacher to do? If the teacher is under pressure to improve results, she shouldn’t spend time on speaking and listening, because those skills aren’t assessed. It’s the age-old motto: what you test is what you get. 

Overall, though, I’m in favour of high-stakes assessments, because they make the system fairer. People often say that we should get rid of tests, but that will cause other problems. 

What kind of problems?
Teacher assessments, for instance, are biased in all kinds of strange ways. 

Let me give you an example: in one study, researchers took photographs of children, and used Photoshop to make two versions. In one photo, the child was thin, and in the other, the child was probably about a stone and a half overweight. 

The researchers attached one of these photographs to the student’s piece of work and gave it to people to assess. The people who thought they were assessing the work of the overweight child gave it a lower score. 

We also know that teachers give higher scores to students they like and to students who share the same cultural background as them.

Dylan Wiliam’s vision for fair and accurate assessment


So, if both exams and teacher assessment are imperfect models, what’s the solution?
Combining teacher assessment and external assessment is, in my opinion, the best option.  

Most assessment people would agree that you need to have more than a single source of evidence, and the advantage of teacher assessment is that they can collect a lot more evidence than anyone can collect in a one-and-a-half- or two-hour examination. 

But we know that there are problems to overcome here, too, with bias and so on. That’s why a combination of teacher assessment and external assessment is needed.  

How would that combination approach work in practice? Are there any other countries that are already doing it?
South Australia used to have quite an interesting system. Teachers would give a grade for coursework, and then the students would all do the same test. 

In moderation meetings, the school would then compare the average scores for both, and ask how similar they were. If the test scores were lower, it suggested the teacher was being a bit lenient; if they were a bit higher, the teacher was being a bit severe. 

They also checked standard deviations, the spread of scores and the correlation of the two components. The grades were adjusted accordingly; that was a good way of bringing these things into line.

The difficulty is that you have to decide whether to combine the results, or separate them. I think it’s better to combine them - otherwise, employers and universities can choose the grade they believe is more reliable. 

Can separating out the grades ever work? 
In Sweden it does. There, high school assessment is teacher assessed, but alongside this, the Swedish Scholastic Aptitude Test (SweSAT) is accessible to all students. You are then automatically considered for university on whichever grade gives you the best chance of getting in. 

If you have a strong social system, which doesn’t allow people to prefer one or the other but is about which grade puts the student in the best position, you can report grades separately.

Ultimately, though, I think it’s better for the two to be combined, because each source of evidence can mitigate the weakness of the other. The standardised assessment gives you the latest and best information, while teacher assessment gives you much fuller information. 

Both are limited, but together they can provide a fairer picture. So, we really need to find ways of combining the two.

‘I’m in favour of high-stakes assessments. They make the system fairer’

OK, let’s talk about how that combination could work in practice. How much weight should be given to teacher-assessed grades versus external exams?
The weights could be different in different subjects. If it’s a more vocational subject, it makes sense to have the assessment that judges the skills weigh more. But there shouldn’t be a hard and fast rule.  

And how often should assessments take place? 
It needs to be throughout the year: if you only assess in the summer, then you get the latest and best information, but it’s not very reliable, because you can’t collect that much information.

However, if you look at something like modular science GCSE, students do tests throughout the year. The difficulty here is: how do you compare performance in a module that’s tested at the end of Year 10 with performance in a module halfway through Year 11? Presumably, students got better in that time. 

So, it’s still not perfect. The key to making this approach work is in the trade-offs: you have to be clear about these trade-offs and say, yes, we’re making this trade-off, because we think this is more important than that. 

Is it possible to test too much throughout the year?
Yes. There are some methods of assessment that I believe are never worth it; continuous assessment is one of them. In America, we have this, and I believe it is a terrible idea.

High school students get a grade every week or two, and the score they get at the end of high school - their high school grade point average - is the average of all those grades.

But this means students don’t take any risks. They won’t even take courses in which they might get a B instead of an A, because they don’t want to end up with a 3.9 rather than a 4.0 grade point average. 

We need our assessment systems to both be distributed - so collecting information across the whole course - but also synoptic. Assessment should require students to synthesise everything they’ve learned over the course; we need to have a big judgement about the whole field of study, rather than allowing students to learn stuff and forget it. 

Is it better for students if they don’t know when assessment is happening?
No, they definitely need to know. This is important for two reasons. The first is just a matter of basic fairness, but the other is about meaningfulness. 

Let me give you an example from when I was teaching. I taught a boy called Leicester, who was the captain of the under-13s football team. In a maths lesson on probability, I asked him whether he called heads or tails during the coin toss at the beginning of a football game. 

When he told me heads, and I asked why, he said that it was because heads came up more often. I then asked what the probability was of it landing on heads, to which he replied 50 per cent. He knew that in a maths classroom, he needed to play this game of 50/50, but his belief was that the coin landed on heads more often.

If you did a “stealth assessment”, and eavesdropped on him talking about probability outside of the maths classroom, you would conclude that he didn’t know that heads and tails are equally likely.

Telling students they are being assessed is just a fundamental requirement of fairness and validity. Students also need a chance to take risks; if you have to play safe in everything you do, because there’s a chance it will contribute towards a score, students will become automatons. 

Dylan Wiliam’s vision for fair and accurate assessment


We have talked a bit about the effect of teacher assessment on students, but there are implications for teachers, too, specifically around workload. Is there any way to mitigate those?
The workload issue has always been around trying to make teacher assessment look like exam assessment, trying to bring the same rigour to that process. That’s why it’s bureaucratic and over the top.

I think teachers should keep whatever records they find most useful for teaching, and if that’s a workload issue, well, I don’t care. You should be doing this, you should be finding out what your students are learning, and you should be keeping records that help you teach better. 

Schools should then be able to propose how they’re going to derive the teacher-assessed component. We can put a system in place that means it shouldn’t actually require any extra workload. 

What might that system look like?
Let me give a concrete example, using science as the subject.  

Schools would be told to send the names of the students who are doing GCSEs to the exam board in February. In May, the exam board sends each student a customised examination.  

One might get a test on biology, while another gets one on chemistry or physics. Four might get a test on collaborative working, while two may be tested on working practically. 

The important point is this: the teacher cannot know who is going to get which test in advance. Therefore, the only way to teach to the test is to teach each child everything. 

Those exams are sent back to the exam board and instead of attaching grades to individual children, they tell you that in this class, you’ve two grade 9s, three 8s, six 7s and so on. 

It’s then up to the teacher to decide which student gets which grade using all the teacher assessments done throughout the year - as part of your normal work - to make that judgement. 

Within that system, what role do you see exam boards playing?
They’d offer external support to teachers around calibrating judgements. 

Teachers will have questions: they may think they had five 9s in the class, but actually, according to the standardised assessments, there are two 9s. Or they may think there aren’t any 9s in their class, but the results say there are three.

The exam board could then report on the national picture, and help teachers to decide if they’re being too soft, or too harsh. 

Would there be just one national exam board, then?
I don’t see the argument for more than one exam board. 

The reason we can have competition for oil, gas, phone services is that these things are commodities and, therefore, as a consumer, you can choose one to your advantage.  

What’s the advantage of having more than one exam board? Maybe a few things about flexibility, about allowing evolution, but I think they’re outweighed by the negatives. 

Whether it’s done by a government agency, or by a private contractor is a separate issue. Many countries have a national testing body, and I don’t have a problem with that, provided it’s free from government interference. 

In England, it probably wouldn’t be.

Why do you say that?
Our assessment system is not fit for purpose because politicians are determined to interfere with the process to make it serve political ends, rather than actually thinking about the students. 

Everybody wants the same thing: information about what students have learned in school. But when political considerations are imposed on that, it distorts the process considerably. 

‘I don’t see the argument for more than one exam board’

Is there a way for us to change this?
The big problem in England is that people want to use exam results as a way of judging school quality, even though most of the variation in a school’s GCSE results is caused by how much the students knew when they started at that school. 

The government doesn’t consider contextual value added anymore, but when they did, the school contribution accounted for only a 7 per cent variation in the GCSE results.

We need to take assessment out of the political arena and establish an independent body whose main task is to maintain standards over time.

Ofqual, of course, knows how to do this, but currently, they supervise the exam boards’ maintenance of standards. 

Ultimately, though, the exam system that you’re working with is the exam system that you’re working with. We can wish it was different, but that’s a policy decision, and I don’t think England is going to change anything very rapidly. 

What can teachers do to improve how they work within the system that we have, then? 
Given that schools, teachers and students are evaluated on these exam results, one of the things you can do is use summative tests in formative ways. By that, I mean using the exams as examples of what students need to do on the day. 

I’m not saying we should start this in Year 9, but certainly in Year 11, once you’ve laid a sound foundation of knowledge, practising past papers is probably the best way to get a good score. Most teachers do this already. 

Is there any particular technique that you’d recommend when using past papers?
One technique I like is getting students to do a past paper under test conditions. The teacher then collects the papers, but doesn’t mark them. 

The next day, the teacher puts students into groups of four, and gives the papers back, with a new blank test paper. As a group, students then need to construct the best response. They can compare their answers and see who’s got the best one. 

We know that students often tell each other things that are not correct, so the teacher then needs to do a whole-class session to make sure they’re on the right track, and ask: “What’s this table’s answer to question one? How about question two?”

For those who might still be unhappy about the return of exams, do you have any final thoughts?
Ultimately, we need to embrace that we have a system that is completely exam dominated. 

Up until Year 11, teachers need to teach well, and develop thorough understanding, but in Year 11, students need to be well prepared for the exams they’re going to take because it’s unlikely that’s going to change any time soon. 

But we also need to remember why we have exams in the first place: they really do make the system fairer. 

People forget that the most famous test in the world, the American SAT, was introduced to stop Harvard being filled with the children of alumni. It was designed to be a genuinely meritocratic process. 

So, I think high-stakes assessments are important, but the challenge is to find ways to mitigate the most negative effects. 

You need a Tes subscription to read this article

Subscribe now to read this article and get other subscriber-only content:

  • Unlimited access to all Tes magazine content
  • Exclusive subscriber-only stories
  • Award-winning email newsletters

Already a subscriber? Log in

You need a subscription to read this article

Subscribe now to read this article and get other subscriber-only content, including:

  • Unlimited access to all Tes magazine content
  • Exclusive subscriber-only stories
  • Award-winning email newsletters

topics in this article

Recent
Most read
Most shared