Consider the following frequency
distribution. Find an estimate for the mean.
Let’s have a closer look at this
frequency distribution. The data has been grouped into
classes, which are labeled as 10 dash, 20 dash, 30 dash, and so on. We’re given the frequency, or
number of items, in each class. We don’t know the exact value of
the data points because they have been grouped. But we do know that in the first
class, for example, there are nine pieces of data. We’re asked to find an estimate for
the mean of this distribution. And it’s because we don’t know the
exact data values that we can only estimate, rather than calculate, the mean.
We recall that, in general, the
mean of a data set is found by dividing the sum of all the data values by how many
values there are. When we’re estimating the mean,
however, we need to find an estimate for the sum of all the data values. We do know the number of data
values. This corresponds to the total
frequency in the table, which is 50. So, let’s think about how we can
estimate the sum of all the data values. We first need to find a single
value that is most representative of each class. We want to choose the central
value, or midpoint, of each class, which is the mean of the class boundaries.
From the table, it may appear as if
we don’t know what the upper boundaries are due to the way the classes have been
presented. To work these out, we need to
assume that there are no gaps in the data. So, the lower boundary of one class
is the upper boundary of the previous one. The first class then will contain
all the data values that are greater than or equal to 10, but strictly less than
20. The next class will contain all the
data values that are greater than or equal to 20, but strictly less than 30.
By writing the inequalities in this
way, with a strict inequality at the upper boundary of each class and a weak
inequality at the lower boundary, we ensure there are no gaps but also no overlaps
between the classes. When we come to the final class, we
have to make an assumption about its upper boundary, as there is no class that
follows it. We assume this class has the same
width as the class immediately before it. In this distribution, all classes
have the same width of 10. And so, we assume that the final
class also has a width of 10, and hence its upper boundary is 60.
Having found each of the
upper-class boundaries, we’re now ready to calculate the midpoint for each
class. Each midpoint is the mean of the
lower and upper boundaries for that class. The first midpoint is 10 plus 20
over two, which is 15. The remaining midpoints are 25, 35,
45, and 55. We’ve now found a single value that
we can use to represent each class and hence to estimate the sum of the values in
In the first class, there are nine
data values, which all have a value close to 15. Hence, the estimated sum of the
values in the first class is nine multiplied by 15, which is 135. We can estimate the sum of the
values in each of the remaining classes in the same way, each time multiplying the
frequency for that class by its midpoint.
To find the estimated sum of all
the data values, we add together the estimated totals for each class, which gives
1,800. So, our estimate of the sum of all
the data values is 1,800. To estimate the mean, we divide
this estimated total by the total frequency of 50. 1,800 divided by 50 is 36.
So, by recalling the process for
estimating the mean of a frequency distribution, which requires us to first find the
midpoint of each class, we’ve estimated the mean of this frequency distribution to