What’s the best way to assess a four-year-old?

The way in which schools assess the early years foundation stage is set to change – and it’s sparked off a big debate in the sector about how to get the measure of young children’s abilities, finds John Morgan

22nd January 2021, 12:00am

John Morgan

Persuading a four-year-old to stick to a task is tough for any adult, be they a teacher, parent, army general or Andy from Andy’s Dinosaur Adventures. This makes assessing children in Reception a challenge. And that being a challenge is a problem for those who work in EYFS, as assessment “is a critical aspect of all education, as this informs pedagogy, curriculum progression and being accountable for impact”, says Jan Dubiel, an international early years specialist who oversaw the implementation and moderation of the Early Years Foundation Stage Profile for what was then the Qualifications and Curriculum Authority.

It’s a problem, too, because of several recent major policy changes in the early years sector. As of the coming September (unless there are further delays owing to the Covid pandemic), Reception will be the only year of the English school system with two separate statutory assessments. That plan has set off a major clash: some practitioners argue that the traditional - and correct - emphasis on observational assessment is being undermined; that the “tests” are age-inappropriate; and that they are simply a data-collection exercise, not an effort to improve outcomes. Others point to flaws and biases in observational assessment, and believe there is a requirement for standardised assessment in EYFS.

It’s all set in motion a debate that ripples beyond England and feeds into a larger question for all in education: how should you measure a four-year-old’s abilities?

‘Black box’ data

It is useful to frame the discussion by looking at the current picture in early years assessment in England and how it is changing.

Firstly, consider baseline assessments on entry to Reception. Schools do their own, usually observational, assessments to find out how children are doing, looking at their starting points so they can plan their teaching. But the Department for Education’s new Reception Baseline Assessment (RBA) will become statutory in September, having been piloted in 2020. The “activity-based” test, conducted on a laptop or tablet, will assess children on language, communication and literacy, plus maths.

The results will be “black boxed”: “no numerical score will be shared [with schools] and the data will only be used at the end of Year 6 to form the school-level progress measure,” says the DfE. So, schools will have to carry on doing their baseline assessments for internal purposes, on top of the statutory test.

Secondly, there is ongoing assessment over the course of Reception for schools’ internal purposes. All teachers and teaching assistants conduct observations to see how children are doing and how they are measuring against age-related expectations, supporting them towards the early learning goals (more on those shortly). But school leaders, governors, multi-academy trusts and local authorities often ask for/require/demand with menace more formal measures of progress.

The DfE’s Development Matters curriculum guidance - which is not statutory - often comes into play here. It sets out “the pathways of children’s development in broad ages and stages”, but there are concerns that it is often used inappropriately to create formal measures of children’s progress. New Development Matters guidance was published in September 2020, giving providers a year to prepare before changes to the EYFS statutory framework come into force in September this year.

Thirdly, there’s the EYFS profile for each child at the end of Reception - the statutory assessment that is part of the overall framework. In the DfE’s words, the profile provides “a well-rounded picture of a child’s knowledge, understanding and abilities, their attainment against expected levels”, measured against early learning goals, revised versions of which will come into force in September.

In response to the question of how best to assess a four-year-old, the answer from government appears to be: do it a lot.

This frequency of high-stakes assessment is not the only issue that some in the early years sector have with the proposed changes, though: many in the early years sector feel that the proposed changes shift assessment further away from what they feel works best. For a start, there is a feeling among many frontline EYFS teachers that the changes continue a purposeful erosion of the role of teacher judgement and that this is undermining the reason for assessment in the first place; they feel that assessment has become something not for improving outcomes, but simply to measure outcomes.

Assessment “should be about supporting children’s learning. We assess children so we know what we need to do next and how we need to build our curriculum as teachers,” says Ruth Swailes, an early years specialist and former primary headteacher.

Beatrice Merrick, chief executive of early years charity Early Education, adds that “early years pedagogy is based on a cycle of observing children, assessing where they are, planning what to teach based on what you’ve seen”.

There are also fierce criticisms of the new early learning goals, mainly around what has been perceived to have been left out, and the new RBA has come under heavy fire, too. On the latter, “black boxing” the results means that the data cannot be used by EYFS teams, but many experts argue that it is unlikely to give you any useful information anyway.

Young children do not perform accurately and reliably in test situations, partly because they do not understand the ethnography of testing - getting the right answer - and approach test situations in their own idiosyncratic way, with their own perceptions, understanding and ideas; therefore, the resulting data is not reliable,” says Dubiel.

Watch and learn

Samuel Meisels, the Richard D Holland presidential chair in early childhood development at the University of Nebraska and founding executive director of its Buffett Early Childhood Institute, says that “testing young children is always a difficult proposition because development in the early years is not linear”.

“Children may make gains in cognitive tasks shortly after starting school, or even a year later, but not be able to demonstrate these gains when they are being tested at the outset of school,” he says. “This is one of the reasons that observational assessments, which take place on multiple occasions in a child’s school experience, can be so meaningful.”

And David Whitebread, a former lecturer in early childhood education and specialist in playful learning at the University of Cambridge, says research evidence indicates that “the most valid assessments of young children’s levels of abilities” most commonly occur when they are engaged in activities “that they perceive as playful, which may involve an element of pretence”; this, he adds, makes it “much less likely that a standardised test will produce a fair assessment of all young children”.

All in all, a shift away from observational assessment and towards standardised tests is, according to many in the EYFS sector, the wrong move. If you want to assess a four-year-old, they argue, then watch them, don’t test them.

Dubiel says that “observing children in everyday contexts and interacting with them sensitively provides authentic information that can then be ascribed to criteria that is then moderated for consistency…This ‘observational’ approach is generally used in EY settings across the world”.

Observational assessment approaches are favoured by many of those on the front line of teaching in the early years, too. Helen Pinnington, early years foundation lead at St Thomas More’s Catholic Primary School in Bedhampton, Hampshire, explains that they fit closely with the pedagogy of the sector.

“I typically work alongside the children to teach them new skills, and regularly step back to see if they are able to apply the skills independently through child-initiated play,” she says. That, she adds, “fits perfectly with early years pedagogy and reflects the way that very young children learn and show learning”.

She has concerns about a baseline test conducted in the first few weeks of Reception, at a time when staff should be prioritising the building of good relationships with children.

“It is a simple fact that when children feel secure in their environment and most comfortable, they will really show you what they can do,” Pinnington says. “They are also more likely to do this and reveal a full picture of their abilities if it is done through play.”

And schools’ previous experience of a baseline test raised a lot of concerns. Alice Bradbury, an associate professor of sociology of education at UCL Institute of Education, carried out research looking at the classroom impact of the last guise of a statutory baseline assessment, which was in force for one year in 2015 before the DfE dropped it, citing concerns about “comparability”. The issues raised by teachers, says Bradbury, were that “children did better when they weren’t tired or hungry”; that “young children do not always understand the concept of getting a question ‘right’”; and that “levels of familiarity with technology can also be an issue for some forms of tablet-based assessment”.

Could there also be a wider impact on the early years from the introduction of the RBA, specifically around the use of observational assessment? Merrick thinks that statutory assessments are “driving the idea that assessment has to be formal and standardised rather than valuing teacher assessment”.

Bradbury, meanwhile, talks about the “datafication” of the early years, in which the teacher becomes a “collector of data” and each child “becomes a source of data, a data point”.

However, there are those - both in academia and the classroom - who believe that standardised assessments are not, in fact, a one-way ticket to turning four-year-olds into a series of digits, or at least not if those assessments are conducted in the right way.

Speaking in a personal capacity, Rob Coe, former director of the Centre for Evaluation and Monitoring (CEM) at Durham University and now senior associate at the Education Endowment Foundation, says that the question of whether there’s a role for standardised tests in the early years depends on what is meant by “standardised tests”.

“If you mean an assessment process that generates validated, reliable scores for a specific purpose, that are comparable across settings and over time, then [the answer is] yes,” he continues. “But if you think a ‘test’ means a high-stakes, high-anxiety, formalised process, then no. There is a strong case for high-quality assessment of language and number at the start of school, if only for its diagnostic value, especially if early years teachers are properly trained and supported to understand and respond to the results.”

Pointing to the examples of the RBA and CEM’s baseline assessment, Coe says there are examples of more structured assessments that work well in their technical quality and the experience of the children taking them. The challenge in designing early years assessment is to create a process “that does not depend too much on the individual relationship or the state of mind/body of the child at that moment”. He says that “the best available assessments have demonstrated empirically that these concerns are not too much of a problem”, adding: “That may conflict with the perceptions of some teachers, but I would go with the evidence on that one.”

Julian Grenier, headteacher of Sheringham Nursery School and Children’s Centre, and the writer of the new Development Matters guidance, also believes there may be a place for standardised assessment. While he says observation is important as “we need to look closely and record what children are doing and saying with precision so that we can get to know each child”, he also says it’s important to acknowledge “that we do not observe children in a neutral way”. Biases, including racial biases, could potentially come into play, he notes, plus “we will look for what we want to find - the phenomenon psychologists call ‘confirmation bias’.”

That’s one reason Sheringham Nursery School started using an app-based system of standardised assessment called Early Years Toolbox, which showed that “many children had a better understanding of number than we realised” and that “some children who were ready conversationalists had quite limited vocabularies”, says Grenier.

Another problem with observational assessment, says Coe, “is that if you just wait for the child to demonstrate a particular behaviour or piece of knowledge, they may never show it, even if they know [or] can do it. That is why the best assessments include structured prompts or questions to give the best chance to elicit the desired behaviour.”

Accountability concerns

So, the best way to assess a four-year-old is a bit of observation and a bit of standardised testing? It’s not that clear-cut. Creating a lot of noise and blurring the lines between the two points of view is the fact that there is a lot of non-statutory continual assessment going on in the early years that requires more evidence and workload from teachers than is necessary. Observational assessments are, argue many, being formalised and thus warped, owing to accountability concerns.

Misuse of the DfE’s Development Matters guidance is widely seen as a key problem here. Its 2020 guise makes clear it is “not a tick list for generating lots of data”. Yet James Pembroke, founder of Sig+, an independent school-data consultancy, sees much “inappropriate use of Development Matters bands for tracking, which is not what they are intended for, and the subdivision of those [age] bands to create points scores” in a way that ignores the overlap of the bands.

Pinnington agrees: “Development Matters was not designed to be used to measure assessment for data purposes. We all know this, yet we are pushed to use it inappropriately.”

Reception teacher Elaine Bennett recently wrote for Tes on this issue. She explained that “age bands are not being broken down as a result of their being too broad. Instead, it’s a result of tracking software” (see box, below).

“Why does tracking software require this?” she continued. “Because schools and settings are under huge pressure to prove children are making progress, so we have to create artificial hoops for them to jump through.”

Pembroke believes the same. “The mistake schools make with tracking is this idea that by inventing more bands, you’ve shown a child has made more progress,” he says. But in reality, schools are “just conjuring [these bands] out of thin air”.

Pinnington states that splitting the Development Matters age bands into sub-levels is also done by schools “in an effort for EYFS to align their assessment with the rest of the school”.

There are already signs, notes Pembroke, that schools are doing the same with the new Development Matters guidance as they did with the old version: splitting it into sub-levels for tracking.

To be clear, the DfE’s early years profile handbook states that evidence for the EYFS profile “should come from day-to-day activity in the classroom”, with “no requirement that evidence should be formally recorded or documented”. But there’s concern that misconceptions about the EYFS profile are driving a lot of unnecessary data collection.

Swailes points to teaching staff spending significant portions of their time on iPads taking hundreds of photos and documenting what can sometimes be unremarkable events in a child’s everyday school life. This generates “heavy workload…when actually you could be having interactions with children - and that’s the thing that makes a difference to children’s learning”, she adds.

This raises an important question: if this is happening already, how much worse could it get with the introduction of the RBA?

There is a degree of agreement that the RBA is an unusual move within the global context and one that is likely to have unintended consequences. “Children aren’t even in school at this age in most countries…This kind of formal statutory testing of children this young is completely unlike most of the rest of the world,” Bradbury observes.

Commenting on the fact that Reception will have two statutory assessments from September, Swailes says: “These children are 48-60 months old. That is a problem.”

Coe also sees potential problems. “I think the case for a Reception baseline is mainly for the diagnostic value, but also as a baseline to track progress from,” he says. “However, in the context of the kind of accountability pressure we have in England, it is hard to see how a statutory assessment can achieve both without significant, harmful side effects.”

There is a concern, for example, that what is left out of the RBA influences schools’ perceptions about what is and is not important in the early years. Meisels says that while the piloting and early validity studies of the RBA are “impressive”, the areas not being evaluated in the assessment include social-emotional development, scientific thinking, the arts and motoric development. “Young children often will rely on one or more of these domains in responding to ‘math’ or ‘language and literacy’ questions,” he adds.

There have been similar concerns about the new early learning goals, specifically around the prominence of shape, space and measure.

And the RBA will likely increase the demands of school leaders for EYFS teams to evidence progress, as they will know they will ultimately be judged on that progress based on the baseline test when those children reach Year 6.

iPads down

Can you blame the standardised tests for schools then making decisions about content or formal observations? Can you blame the issues around observational assessment on accountability? These are difficult questions that need further exploration.

However, with the RBA and other changes being non-negotiable, what would be the more helpful, general ways to ensure that assessment still does what it is supposed to do and helps children’s learning to improve?

Swailes says there should be a focus on “new and innovative ways to evidence learning so that teachers feel confident that if an adviser or an inspector or a senior leader comes into my room, they will be able to see what we’ve been learning and what we’ve been doing, but not in an onerous way”, thereby freeing teachers “to spend time with children - which is what the job is about”.

How would that work? “By putting down the iPad,” answers Swailes.

She talks about whole-class “floorbooks” - big scrapbooks that record key developments, such as important vocabulary or new concepts learned, rather than individual journals that record too many unremarkable moments. A floorbook “builds on children’s learning and is actually a useful teaching document, rather than being this book that gets filed away that nobody, apart from parents, really looks at”, she says.

Ending the situation in which staff pay rises can be determined by essentially meaningless data might be another solution. Pembroke says of the splitting of Development Matters bands into sub-levels for tracking: “Believe it or not, there are still schools where teachers are set performance-management targets for children to make certain amounts of progress based on the data that the teacher themselves are responsible for creating.”

Merrick argues that part of the way forward is also “about school leaders being brave and saying ‘we don’t need data, don’t use that as a security blanket’”, and instead being prepared to “go in and see what’s going on” in EYFS.

Early Education will publish “Birth to 5 Matters”, an alternative to the new Development Matters, at the end of March. Merrick says this will detail “a wider range of tools and approaches” - ones that are “not about tick lists, but about understanding children’s development”.

However, for Pinnington, the answer is simple: better assessment will come down to trust and understanding. She says early years teachers “can very easily demonstrate progress, but we do need leaders and inspectors to accept the differences for early years assessments and to allow some flexibility for this phase”.

“We are often drawing on ‘softer data’, especially when a huge part of our learning is centred around personal, social and emotional development,” Pinnington adds. “Overall, it is key to remember that it’s best to notice learning in early years when children are doing it.”

John Morgan is a freelance journalist

This article originally appeared in the 22 January 2021 issue under the headline “How do you assess a four-year-old?”

Is the early years sector ‘not engaged’ with assessment reform?

Reception teacher Elaine Bennett writes:

Most schools and settings use the Development Matters (2012) document for EYFS, with its wide, overlapping age bands, to help assess and support the children in their setting.

This is going to change in the next 12 months and the reasons given for that are a little confusing to those of us working in the sector. Let me explain.

First, we have been told that we are using levels of assessment from which the rest of education has moved on. In a recent article for Tes, Julian Grenier explained that “because the age bands are so wide, many schools and settings break them down further. So the 22- to 36-month band might be broken down into two or three sub-levels like ‘emerging’, ‘developing’ and ‘secure’. The rest of the education system abandoned these types of levels in assessments back in 2015.”

The trouble with this statement is that age bands are not being broken down as a result of their being too broad. Instead, it’s a result of tracking software.

Why does tracking software require this? Because schools and early years settings are under huge pressure to prove that children are making progress, so we have to create artificial hoops for them to jump through.

Where does this pressure come from? Well, for this we can largely thank Ofsted and the Department for Education - ironically, the very people now telling us we need to reform assessment, as data shouldn’t be our focus.

Another issue, apparently, is that the current framework means teachers are matching photos to 510 age-band statements, with fingers blistered from glueing and eyes sore from data entry.

This, in my experience, simply does not happen. Moreover, to say it does happen is disrespectful to a sector that knows how to assess, that knows how to observe, that knows and understands how young children learn and develop.

It is also disrespectful to those of us who, like me, spent years as local authority EYFS moderators busting these myths around evidence.

Attempts to paint these reforms as a “workload reduction” are misleading.

The comeback on both these points may be that EYFS professionals aren’t giving their views on assessment, that they don’t even see the problem. Grenier writes that EYFS professionals were not engaged in the “lively debate” about the “need to reform assessment”, and are instead “sitting this one out”. In my experience, this is simply not true either.

We are talking about assessment: it’s all over early years social media pages, especially those where teachers are trying to make sense of the new statutory framework and accompanying guidance as “early adopters”.

However, the truth is that teachers have a lot going on right now. There’s a global pandemic on; it might not have been the best time to launch new documents. We are spinning so many plates that fully engaging in the intricacies of the assessment debate is almost impossible.

And if we truly want a change, it isn’t about one document - it’s about systemic change, a change in culture where performance management isn’t linked to hypothetical “progress” and where tracking systems don’t try to track what cannot simply be tracked on a grid: child development. It’s about those leading schools and academy chains - whose staff are often not EYFS trained - being trained in child development and effective, appropriate assessment.

It’s about a much bigger discussion than the government seems to want us to have.

While I respect and admire the commitment Grenier has to reducing the workload, as well as practitioners refocusing their efforts on teaching and away from excessive documentation and ticking-off, I find the argument for change troubling.

We are on the ground, we are focused on our children every day, and the pressure we are under does not come from a document that needs change - it comes from above.

A rewrite of non-statutory guidance and the removal of bands won’t change anything. We need more than rewrites, more than blogs, articles, webinars and social media posts. We need change. Real systemic change in our education system from the foundation upwards.

John Morgan