Q) Please can you help me with cumulative frequency? First, when given a small set of data, how do you work out what the values of the lower and upper quartiles are without drawing a curve? Second, how do you work out where to draw the line for the median when drawing a cumulative-frequency graph? Some texts add one to the total frequency, when the frequency is an odd number, before dividing by two to find half the frequency. Others say you don't add one to the frequency first. Which is correct?
A) Doreen Connor, Census at School co-ordinator and former head of maths at Chilwell School, provided a really nice way to explain what the processes are, using real data from the Census at School project (see below).
Before discussing how we work out the median and quartiles, it is important to explain why we might choose a median average as opposed to a mean or mode average. The median takes out the very high values and very low values that might be in the data so, unlike the mean, they are not embedded in the calculation.
Cumulative frequency is used when you wish to look at how the data is built up or accumulating. The data must be placed in order, and to help understanding, divided into halves by the median or quarters by the quartiles. The middle of the data is called the median and is located so that half of the data is below it and half above it.
So, with 30 items of ordered data, the median will be between the 15th and 16th positions - ie at the 15.5th position (30+1 2 ). To get the correct position we add one to the total number of data items and divide this sum in half.
To understand this, consider the following example, where we have just three data points represented by three symbols: C, s, and u. The second one, s, is in the middle. To locate this median we add one to three (the number of symbols or items of data), then divide by two ( 3+1 2 = 2nd position). If we didn't add one, we would get 1.5, which is obviously not in the centre. Likewise the quartiles, or quarter points, are at n+1 4 and 3n+1 4 .
With 20 items of data, the calculations become quite complicated. The median would be located at the 10.5th position, between the 10th and 11th items, the lower quartile at position 5.25 and the upper quartile at position 15.25.
However, because these statistics are estimates used to help us to understand the data, it is quite acceptable to consider the data values involved and then decide on the quartiles: eg, if the fifth value is 100 and the sixth is 200, you may choose the lower quartile to be 125. But if the fifth value is 100 and the sixth is 102, you may choose 101 as the quartile. Hopefully this won't occur very often!
For larger data sets, say 200 values or more, the difference between locating the median at the 100.5th position and the 100th position is usually so small that it is negligible, especially when finding it from a graph where the data has already been put into groups.
Because the data is usually grouped to draw a cumulative frequency graph, the median and quartile values will be estimates and therefore be slightly different from the ones worked out from the raw data (see the example below from Census at School www.censusatschool.ntu.ac.uk).
For higher-ability pupils these slight differences can be useful to explore, as it is a good illustration of one of the effects the grouping of data has. We can see from the raw data statistics from the table for a sample size of n = 30 that the lower quartile = 150cm, the median = 162.5 cm and the upper quartile = 171cm.
When we look at the graph which has been drawn by grouping the data up to 120cm, 140cm and so on, the total frequency is n=30, the lower quartile = 151cm, the median = 164cm and the upper quartile = 174cm. As you can see, the values are slightly different.
But these are estimates, and so to answer your last question, it doesn't matter whether or not you add one to the total frequency on the graph when the frequency is odd, as this is only an estimate.