Video Transcript
In this video, we will learn how to
estimate the median for data presented in a grouped frequency table by using
cumulative frequency graphs.
To begin, let’s recall what we mean
by the median of a set of data. Let’s imagine that we have this
ordered data set of the numbers three, three, five, seven, 10, 14, 14, and 19. The median of a data set represents
the middle value when the values are in ascending or descending order. So that means that half of the data
is above the median and half of the data is below the median.
This data set has eight values. So the middle value will lie
between the fourth and the fifth values. Sometimes this value is easier to
work out than others, but remember that the middle value between two numbers is the
midpoint of these values. To find this, we take the fourth
value, which is seven, and the fifth value, which is 10, add them together and
divide that answer by two. And that gives 8.5. So the median of the set of values
is 8.5.
This set of values had an even
number of values, but if the values have an odd number, then we simply use the
middle value. However, when dealing with grouped
data sets, we can’t find the median in the same way. For example, let’s say that we have
this grouped frequency table. The first class, zero dash,
represents values which are zero or greater but less than five, because five is the
lower boundary of the next group. So we can observe, for example,
that there are three values in this set that are 10 or greater but less than 15.
However, when it comes to grouped
tables, if we just have the table, we can’t tell what the original values are. This could be a data set for the
values above. Or it could be an entirely
different set of data values. Both sets of data would produce
identical grouped frequency tables. We don’t know the actual values
from a grouped data set alone. And so when we’re using a grouped
frequency table, we talk instead about finding an estimate for the median rather
than the actual median. And that estimate gives us a really
good indication of what the median might be. For example, in this second
possible data set, the median would be 9.5. And the values of 8.5 and 9.5 are
quite similar results.
Now let’s recap how we can use a
grouped frequency table to find a cumulative frequency, because using a cumulative
frequency graph is an excellent way to estimate the median of a data set. There are two types of cumulative
frequency: ascending and descending. The ascending cumulative frequency,
or often simply the cumulative frequency, of a value 𝑎 indicates the frequency of
values that are less than 𝑎. And the descending cumulative
frequency of a value 𝑏 indicates the frequency of values that are greater than or
equal to 𝑏.
The graphs for each type of
cumulative frequency typically follow these shapes. And we can use both types of graph
to estimate the median of a grouped frequency. To do this on either type of graph,
we draw a horizontal line from the median position on the 𝑦-axis until the line
meets the curve. Then we draw a vertical line from
this point downwards to the 𝑥-axis. And it’s this value on the 𝑥-axis
that is the estimate for the median. Because half of the data is above
the median and half is below, it doesn’t matter which type of cumulative frequency
graph we use to find an estimate for the median.
And if we drew both the ascending
and descending cumulative frequency curves for a data set on the same graph, the
point at which the curves intersect would be at the median position. We’ll now see an example where we
estimate the median using a cumulative frequency graph.
From the following cumulative
frequency graph that represents the masses of some balls that have different
colors, find an estimate for the median. Option (A) 1.7 kilograms,
option (B) 2.1 kilograms, option (C) 2.7 kilograms, option (D) 2.9 kilograms, or
option (E) 3.1 kilograms.
Let’s begin by noting that
cumulative frequency is a running total of the frequencies. And the median of a set of
values is the middle value. Now in this problem, we are
considering a number of balls that have different masses. If we had all the masses of the
balls, we could order them from lightest to heaviest or heaviest to
lightest. Then the median value would be
the mass of the ball at the middle position. But even if we did have the
original masses of each individual ball, instead of having to list all the
masses of them in order, we could use the cumulative frequency graph to help
us.
This first reading on the chart
with an 𝑥-value of a mass of one kilogram and a 𝑦-value of two for cumulative
frequency means that there are two balls with a mass less than one kilogram. And the next value of seven
balls at the two-kilogram mark doesn’t mean that seven balls have a mass of two
kilograms. It means that seven balls have
a mass that is less than two kilograms, and this includes the two balls that
have a mass less than one kilogram.
So how many balls are there in
total in the problem? Well, by looking at the highest
point on the cumulative frequency graph, we can see that 15 balls have a mass
less than five kilograms. So there are 15 balls in
total. So if we had the 15 balls laid
out in order of lightest to heaviest, we need to work out the median
position. And the position of the median
is at half of the total frequency. Half of 15 is 7.5.
Then to find the median using
the graph, we draw a horizontal line from the 𝑦-axis at the median position of
7.5 like this until this line meets the curve. Next, we draw a vertical line
downwards from this point to the 𝑥-axis. It is this point on the 𝑥-axis
that gives us the estimate for the median.
Reading the axis carefully, we
can give the answer that an estimate for the median mass of the balls is 2.1
kilograms, which was the answer given in option (B).
We should be careful not to
make a very common mistake when using cumulative frequency graphs to find the
median. This comes from incorrectly
taking half of the value on the 𝑥-axis. Half of the total possible
masses of five kilograms is 2.5 kilograms, but this is not an estimate for the
median. Neither should we draw a line
upwards from this point to the curve and then read the value from the
𝑦-axis. If we did do this, it would
only tell us that approximately 9.7 balls had a mass less than 2.5 kilograms,
but it wouldn’t tell us anything about the median. And therefore, we should take
care to calculate half of the final cumulative frequency to find the median
position and then draw a line from this point to correctly estimate the
median.
We’ll now see an example where we
estimate the median from a descending cumulative frequency graph.
An employer surveyed 30
employees to determine the distance in kilometers of their commute to work. The data is given in the
descending cumulative frequency graph. Determine an estimate for the
median commuting distance.
In this question, we are given
a descending cumulative frequency graph. The descending cumulative
frequency is different to ascending cumulative frequency because we say that the
descending cumulative frequency of a value 𝑏 is the frequency of values that
are greater than or equal to 𝑏. In an ascending cumulative
frequency graph, the values are the less than values. So for example, the coordinate
10, 24 indicates that 24 employees had a commuting distance that was greater
than or equal to 10 kilometers.
Now given that we have this
descending cumulative frequency, we are asked to find an estimate for the
median. And we can recall that the
median is the middle value when the data is in ascending or descending
order. And we are asked for an
estimate for the median because we have a grouped data set. So that means we can’t find the
exact median. It doesn’t mean that we should
try to guess what it is.
To find the position of the
median, we calculate the total frequency divided by two. The total frequency in this
problem is the total number of employees. We were told that there were 30
employees surveyed, but even if we weren’t given this information, we could
determine it from the graph. The highest value in the
descending cumulative frequency is the first value, since 30 employees had a
commuting distance greater than or equal to zero kilometers. The median position is
therefore 30 over two, which is 15. If we lined up all the
employees from the smallest distance commuted to the largest distance commuted,
the median distance commuted would belong to the 15th person.
So how do we know what distance
they traveled? Well we can use the graph. We draw a horizontal line at
the median position, that’s 15, on the descending cumulative frequency until it
meets the curve. Then we draw a vertical line
downwards to the 𝑥-axis, which gives us 19. Therefore, we can give the
answer that an estimate for the median distance commuted is 19 kilometers.
In the next example, we need to
estimate the median of a given data set. And we do this by first drawing a
cumulative frequency graph.
The cost, in dollars, of cans
of soda in different places is recorded in the table below. Determine an estimate for the
median cost of soda approximated to the nearest hundredth.
In this problem, we are given
the cost of soda as a grouped frequency table. If we consider the first cost
column, this has zero dollars and a hyphen. So, the values in this group
will be the costs that are greater than or equal to zero dollars but less than
50 cents, because that’s the lower boundary of the next class. And the frequency of one means
that one can of soda was in this cost boundary.
Because we don’t know the cost
of every individual can of soda, we can only calculate an estimate for the
median, which is the middle value when the costs are ordered from least to
greatest or greatest to least.
One way in which we can
estimate the median is by drawing a cumulative frequency diagram. We can recall that the
cumulative frequency, or ascending cumulative frequency, of a value 𝑎 indicates
the frequency of values that are less than 𝑎. Now the best way to record the
cumulative frequencies is by adding to the table or perhaps even drawing a new
table.
So let’s draw a new table. And this time, instead of
frequency on the second row, we have cumulative frequency. The first group in the original
table is values that are zero dollars or more but less than 50 cents. However, it is common to
include a starting cumulative frequency of zero. In this context, we are saying
that there were no cans of soda sold for less than zero dollars. The second group in the new
table would be values less than 50 cents. And then we can continue to
create new group headings for the cost in the same way up until the amount of
two dollars.
However, let’s think about what
this group represents. The values in this final group
are two dollars or more. This group doesn’t have an
upper boundary. So we don’t know what values
these are less than. However, in grouped frequency
distributions like this, we can assume that the class widths are all the same,
so in this case 50 cents. We can say that the values in
this group would be two dollars or more and less than two dollars and 50
cents. So in the table we are
creating, the cumulative frequency of the final group would be values less than
two dollars and 50 cents.
Now let’s work out the values
for the cumulative frequencies of each class. The first nonzero cumulative
frequency comes from the first value in the original table. There was one can of soda that
had a cost less than 50 cents. The next cumulative frequency
is for costs less than one dollar. We know that there were six
cans from 50 cents or more up to one dollar. But this one can costing less
than 50 cents is also less than one dollar. So the cumulative frequency is
the sum of these, which is seven.
For the next cost of less than
one dollar 50, we can do the same. 15 cans cost one dollar or more
but less than one dollar 50. So by adding 15 to the previous
cumulative frequency of seven, we know that 22 cans cost less than one dollar
50. And we can find the two
remaining cumulative frequencies by adding 21 and then seven to give us values
of 43 and 50.
Now remember that we’ve done
this so that we can plot a cumulative frequency diagram. The coordinates that we plot on
the graph will have 𝑥-coordinates of the less than values and the
𝑦-coordinates of their respective cumulative frequencies. We then just need to take care
when drawing the grid that we have enough space to include all the values.
So here, we have the points
plotted and joined with a smooth curve. And then we can find an
estimate for the median. Since the highest value in the
cumulative frequency is 50, we know that there were 50 cans of soda. And we can find the median
position by dividing the total frequency, that’s the total number of cans, by
two. Half of 50 is 25, so that means
that the median cost of a can of soda would be the cost of the 25th can. And we can use the graph to
determine an estimate for the cost of the 25th can.
We draw a horizontal line from
25 on the 𝑦-axis until it meets the curve and then draw a vertical line
downwards from this point to the 𝑥-axis. Reading from the graph, the
value for the cost is 1.60. So the answer for the median
cost of a can of soda is one dollar and 60 cents.
Let’s now summarize the key points
of this video.
We recapped that there are two
different types of cumulative frequency. The ascending cumulative frequency,
often just called cumulative frequency, of a value indicates the frequency of values
that are less than it, whereas the descending cumulative frequency of a value
indicates values that are greater than or equal to it. The median of a data set represents
the middle value when the values are written in order.
As noted in this video, we can’t
calculate an exact median from a grouped frequency distribution. Instead, we determine an estimate
for the median. We can determine the median
position using either an ascending or descending cumulative frequency diagram as
median position equals total frequency over two.
Finally, using either type of
cumulative frequency diagram, we draw a horizontal line from the value of the median
position on the 𝑦-axis until it meets the curve. And then we draw a vertical line
downwards from this point to the 𝑥-axis. This value on the 𝑥-axis is the
estimate for the median.