CAT4 tests: what teachers need to know

New to middle leadership in an international context? Understanding CAT4 test data could be critical for your new role

6th December 2023, 11:08am

In schools around the world, GL Assessment data is frequently gathered and analysed to provide insight into a child’s ability, their academic potential and any possible barriers to learning.

This is usually done with CAT4 tests (cognitive ability tests) to provide a personalised report for each student giving a holistic view of their learning profile.

However, at times the amount of data produced can feel overwhelming for teachers, particularly for those new to middle management in international schools, where CAT4 tests are often used to provide an insight into a child’s capabilities in the absence of primary school references or Sats data.

So, if you’re not too familiar with CAT4 tests data, here is a breakdown of the key terms and how this data can be used to inform your teaching, interventions and curriculum planning.

What is a CAT4 test?

CAT4 tests are a series of tests that creator GL Assessment says have been “developed to support schools in understanding students’ abilities and likely academic potential”.

They can be taken either digitally or on paper, and are available for both primary and secondary pupils .

The assessments are grouped into four different batteries, made up of two tests in each of the following areas:

Verbal reasoning - “thinking with words”. This is linked to concepts framed in words: it may involve working out how words in a group are related or the relationships between words. The result of this assessment is often used as a baseline of a child’s potential in essay-writing subjects like English and history.

Non-verbal reasoning - “thinking with shapes”. This doesn’t involve reading but measures a child’s ability to solve problems using shapes or patterns, and is most similar to a typical IQ test.

Quantitative reasoning - “thinking with numbers”. This is the numerical equivalent of verbal reasoning and involves working out the relationship between numbers in a sequence. This result will generally be compared against a school’s internal maths assessment data.

Spatial ability - “thinking with shapes and space”. This involves the manipulation of shapes to demonstrate an understanding of the spatial relationship between images and is often linked to ability in science, technology, engineering and maths (Stem) subjects.

How long do the tests last for?

Each test within the batteries above has different time limits, although most are around 10 minutes.

Overall, GL says each battery “should take no longer than 45 minutes in total”, and that includes time for administration instructions, examples and practice questions. The maximum time for the tests is two hours and 15 minutes.

What information do we get from CAT4 tests?

For each battery, the following information is given after pupils have taken the tests:

Number of questions attempted. As a general rule, the data produced will be more reliable when all the questions have been attempted.

This column could be worth looking at to see if any student missed a large proportion of questions, thus skewing their results.

Standard age score (SAS). This gives a score to compare the results of pupils who were born in the same calendar month. Pupils who score exactly as expected for this age group would be given a score of 100 (scores between 89 and 111 are considered to be within the “average” bracket).

Scores of 112 and above are deemed above average and scores of 88 and below would be considered below average. SAS scores will range from 69 to 141.

Stanines. This mark shows how one pupil’s results compare with other pupils of the same age. Stanine scores range from 1 to 9. Stanines 1 to 3 are considered to be below average results, stanines 4 to 6 are considered average results and stanines 7 to 9 are deemed to be above average.

National percentile ranking (NPR). This is measured against a representative sample in the UK (which is worth noting for international teachers). It shows where a child’s raw score falls in comparison with students of the same age.

For example, an NPR of 30 reveals that a student scored higher than 30 per cent of students who took the test, revealing that 70 per cent of students scored higher than them. This score allows for comparison against a bigger sample rather than simply comparing in one class alone.

How do you interpret the CAT4 data?

Considering the data from all four CAT4 batteries for each pupil can prove to be insightful.

It may help you to personalise a pupil’s learning experience, particularly if you spot a discrepancy between a child’s ability as identified by CAT4 results and their current achievement in lessons.

A high verbal bias is identified when a student has a higher verbal reasoning SAS result than their spatial SAS result. This could suggest a student feels more confident learning through writing or discussions and may not find Stem subjects to be their strength.

A high spatial bias suggests the opposite - this student may find learning through diagrams and charts more accessible than writing-based tasks. The implications for teaching and learning may be to compare this against the child’s reading age in order to be able to fully support the student.

A verbal deficit is the result of a high non-verbal reasoning SAS and a low verbal reasoning SAS. Discovering that a student has a verbal deficit could indicate that they may have difficulties in reading English - possibly as a result of dyslexia or having English as an additional language.

Stanine scores may be used to determine where pupils require either extra support or extra challenge. Students achieving a stanine 1 or 2 in any of the CAT4 batteries could imply a learning need of the child and may require specific intervention - your knowledge of the student and your teacher judgement will aid you in interpreting the results.

Additionally, a stanine of 8 or 9 could indicate that the student is able, gifted or talented in a certain aspect of learning. It is worth comparing this result against class work and internal data for a comprehensive judgement on this.

While the CAT4 tests data can be important, it cannot be used independently of class assessments and your professional judgement.

Only through a personal understanding of students as individuals can CAT4 data be fully interpreted.

Emma Sanderson is head of English at Hartland International School in Dubai and has taught internationally for six years. She tweets @emmanaomi