In his book On Trails, journalist Robert Moor writes that crowds “can collectively make judgements that rival those of the most highly regarded experts”. But there is one important caveat to trusting the wisdom of crowds: participants must act independently, because in those “experiments where people were given access to one another’s answers, the collective intelligence worsened”.
The more influence that members of a group exert on one another, the less accurate the outcome. The only exception to this is when participants are shown previous best guesses, but that only works if those managing the experiment know the answer, and can therefore reveal the information.
When the key stage 2 Sats data was released in July, and all results had risen, I asked whether the setting of the expected standard was really about maintaining the comparability of results or if it was more to do with ensuring the right amount of improvement each year (bit.ly/SatsAcc). I was being provocative, but it caused me to investigate how the standard is set. And this led me to an excellent blog by Ann Heavey (bit.ly/HeaveySats).
First, two groups of 30 members are selected for each of the KS2 test subjects. Each member must complete an activity in advance to ensure familiarity with performance descriptors. On the day of the standard-setting exercise, participants are issued with a booklet that places test items in order of difficulty based on the previous year’s answers.
The wisdom of the crowd?
In round one, members work independently, bookmarking the point that they feel represents the expected standard. In round two, members work in smaller teams to discuss the placement of the bookmark. Between rounds two and three, impact data is shared, which shows the percentage of children nationally who would meet the expected standard based on the various suggested bookmarks. And in round three, the whole group decides upon their final bookmark placement. The final bookmarks decided by both groups then go forward to a standard confirmation meeting.
This does seem a rigorous process, but let’s return to the wisdom of the crowd. What is the impact of group discussion on independently made decisions? How much influence do some individuals exert? How much does the impact data influence behaviour?
Perhaps taking an average of the independently judged bookmarks in round one would be a more accurate way of setting the standard. Although perhaps that wouldn’t result in the right amount of improvement.
James Pembroke founded Sig+, a school data consultancy, after 10 years working with the Learning and Skills Council and local authorities www.sigplus.co.uk