Q) In the Edexcel higher level data-handling project (eg comparing height with weight), is it better to start with a stratified sample of, say, 10 per cent of the whole school (eg 120 out of 1,200), and use that for the whole project, or with a smaller sample (eg 60) and then increase it later to 120? With the stratified sample, should it be, say, 60 boys and 60 girls, or stratified according to percentage of gender and year (eg 150 boys in Year 7 means select 15 of them)?
A) Stuart Hill from Hillingdon School suggests: "120 is more than adequate; Key Skills and other sources recommend more than 50." There is a thread about sample size on The TES maths forum. In the real world of statistical investigation, the taking of a sample is a little more complex than just choosing 30 and assuming that this will give us the information that we want. The quality of the sample is what makes statistical inference work.
Polls regularly predict with great accuracy who will win the general election, even though the sample size used is small.
Dr Jeremy Miles, a lecturer in biostatistics in the department of health sciences, at York university talks in detail about statistical power and sample size (www.jeremymiles.co.ukmiscpowerindex.html).
I wonder what your pupils think about including their weight in the investigation. Pupils at this age can be very sensitive about measuring themselves. Perhaps arm span - or perimeter of the head - with height?
Now to the other part of your question. "Strata" means layers. A stratified sample is one that is made up of different "layers" of a population. The sampling frame in your case is the pupils in the school. The strata are year group and gender. The samples drawn should be in proportion to the size of the group. For your example of 60 boys and 60 girls, the population would have to have the same number of boys as girls - the sample would then represent 50 per cent of each. Real data rarely work out neatly and it is important from the start that pupils understand this.
CensusAtSchool's Claire Turner offers an example using real data from the website (see the end of this article). Imagine the population is divided into subgroups and a representative sample is to be taken. If the population contains three times as many Year 9s as Year 7s, then the sample should have three times as many Year 9s. You must ensure you pick a big enough sample size. So: separate the population into the strata or categories;
* find out what proportion of the population is in each category;
* then randomly select your sample from the population.
From 500 pupils (sampled from the current database) who completed the Phase 5 questionnaire from the CensusAtSchool project, 73 lived in a city, 242 in a town, 155 in a village and 30 said "other". Using a stratified random sample of 50, estimate the average (mean) height of pupils who completed the questionnaire. The sample size must be in proportion to where pupils live:
* City: (73500) x 50 = 7.3 (7 pupils).
* Town: (242500) x 50 = 24.2 (24).
* Village: (155500) x 50 = 15.5 (16).
* Other: (30500) x 50 = 3 (3).
You can use Excel to sort and find your random sample using the RAND() function. Then use the AVERAGE function to find the mean height for each category. My sample calculated: Mean height for City = 168cm.
* Mean height for Town = 160cm.
* Mean height for Village = 163cm.
* Mean height for Other = 169cm.Use these mean heights to find the estimated height for the population of 500 (see box).
The mean height for all 500 pupils, without stratification, is easily calculated using Excel (160.46cm). This is different from the result gained by the stratified sample. It appears that pupils who live in a "City" or "Other" (one-fifth of the total sample) are taller than the average by about 8cm. This has probably accounted for the difference from the overall mean.
This highlights the importance of having a large enough sample size. These calculations were done using real data, provided by pupils. The nature of real data often shows outliers, and small samples can produce skewed results. However, the benefits are that meaningful conclusions can be drawn and good reasons for any anomalies can be easily found. It allows pupils to understand the need for refining their techniques in order that they can answer hypotheses accurately.
There is data from Canada, South Africa, Australia, New Zealand, as well as the UK, at the CensusAtSchool website. It's well worth a visit.