Bradley Wiggins would be pretty unimpressed if his time over a sprint stage was measured by someone watching him from the stands and counting the minutes and seconds: such inaccuracy would not be tolerated. Similarly, Mary Berry would feel a baker’s affront if I judged the quality of her Victoria sponge by sight alone while watching the Bake Off from my couch: the subjectivity of my conclusion would be met with a pointedly raised eyebrow.
When we measure something, ensuring that we have valid and reliable ways of measuring is crucial: our measurements should be fit for purpose.
If we accept that biking and baking can’t be measured in valid and reliable ways using the methods I just mentioned, what about teaching? Can we measure the quality of someone’s teaching by watching them teach and asking others (students, colleagues) to comment? Ultimately, I think the answer is no, and I think a teacher would have good cause for concern if a high-stakes judgment about the quality of their practice was drawn from observations and the comments of students and colleagues. But let’s not throw the baby out with the bathwater just yet. Measures of teaching quality have, I think, an important part to play in helping teachers improve their practice and improve outcomes for students.
At Holland’s University of Twente recently, researchers from Durham, Oslo, Rutgers and Harvard universities joined colleagues from ETS (USA) and DIPF (Germany) to discuss approaches to the thorny matter of measuring teaching quality. While getting researchers to agree on anything is generally a futile exercise, bubbling up from our discussions came two key messages: 1) measuring the quality of teaching is very, very difficult and we currently don’t have ways to do it accurately; 2) the evaluation of teaching quality for high-stakes accountability is not an approach supported by good research evidence.
Teachers, school leaders, policymakers and researchers agree on the importance to learning of high-quality, effective teaching. It seems logical, therefore, that if we could measure the effectiveness of teaching accurately and then act to improve it where needed, the impact on student learning would be significant. That sounds pretty good to me.
Teachers want to do their jobs to the best of their ability, and many want better feedback on how they’re doing. School leaders want teachers to be as effective as possible at increasing valued student outcomes, they want to give better feedback to their teachers, and they want to eradicate ineffective teaching. Systems want school leaders to identify effective and less-effective teachers and to take action to drive improvement. As these actions move from the provision of feedback that supports teacher development toward high-stakes accountability purposes, the requirement for robust, valid and reliable measurement methods increases, but here’s the rub: even our best methods are not fit for many of the high-stakes purposes to which we wish to put them.
The problem with measuring teaching quality starts with the simple fact that there is no agreement over a single, unified definition of it. You can’t measure something if you can’t define the thing to be measured; it would be illogical to think otherwise. Teaching is nuanced, complex, has myriad moving parts and is probably affected significantly by context.
This problem is compounded by the fact that the measures of teaching effectiveness currently used (classroom observations and student perception surveys are among the more common) are often not very reliable and capture only a fraction of a teacher’s actions and behaviour. Such unreliability in the measurement spells real trouble for high-stakes decision-making.
The problem is further exacerbated when we believe that the methods for measuring teaching effectiveness in one country can be transplanted to another with the expectation that they will work in the same way. While we should learn from the successes and failures of other countries’ approaches, we urgently need to do our own research to develop and trial methods for our own schools.
But if years of research from multiple countries suggests that the best we can do in measuring the effectiveness of teaching is pretty unreliable, should researchers continue the endeavour or should we focus our energies elsewhere? Would a focus on developing tools to help teachers make ‘marginal gains’ in the areas that improve outcomes for students (cognitive activation, for instance) be a better use of our time and energy, and not least of (mostly) public money?
Ultimately, if what we do in education research is designed to improve valued outcomes for students, then what we do has a defensible rationale. Good teaching is a complex, multifaceted, nuanced activity the quality of which is very hard to measure accurately, but we can do it to some degree, and the data we gather might be well-used to support teaching and learning.
While teaching quality evaluation data may never be robust enough to form the basis of high-stakes decisions, they could offer a promising basis for teachers to have developmental, diagnostic discussions about their practice.
Stuart Kime is a research student at the School of Education, Durham University