Lesson Video: Grouped Frequency Tables: Estimating the Median Mathematics

Start Practising

In this video, we will learn how to estimate the median for data presented in a grouped frequency table using cumulative frequency graphs.

18:59

Video Transcript

In this video, we will learn how to estimate the median for data presented in a grouped frequency table by using cumulative frequency graphs.

To begin, let’s recall what we mean by the median of a set of data. Let’s imagine that we have this ordered data set of the numbers three, three, five, seven, 10, 14, 14, and 19. The median of a data set represents the middle value when the values are in ascending or descending order. So that means that half of the data is above the median and half of the data is below the median.

This data set has eight values. So the middle value will lie between the fourth and the fifth values. Sometimes this value is easier to work out than others, but remember that the middle value between two numbers is the midpoint of these values. To find this, we take the fourth value, which is seven, and the fifth value, which is 10, add them together and divide that answer by two. And that gives 8.5. So the median of the set of values is 8.5.

This set of values had an even number of values, but if the values have an odd number, then we simply use the middle value. However, when dealing with grouped data sets, we can’t find the median in the same way. For example, let’s say that we have this grouped frequency table. The first class, zero dash, represents values which are zero or greater but less than five, because five is the lower boundary of the next group. So we can observe, for example, that there are three values in this set that are 10 or greater but less than 15.

However, when it comes to grouped tables, if we just have the table, we can’t tell what the original values are. This could be a data set for the values above. Or it could be an entirely different set of data values. Both sets of data would produce identical grouped frequency tables. We don’t know the actual values from a grouped data set alone. And so when we’re using a grouped frequency table, we talk instead about finding an estimate for the median rather than the actual median. And that estimate gives us a really good indication of what the median might be. For example, in this second possible data set, the median would be 9.5. And the values of 8.5 and 9.5 are quite similar results.

Now let’s recap how we can use a grouped frequency table to find a cumulative frequency, because using a cumulative frequency graph is an excellent way to estimate the median of a data set. There are two types of cumulative frequency: ascending and descending. The ascending cumulative frequency, or often simply the cumulative frequency, of a value 𝑎 indicates the frequency of values that are less than 𝑎. And the descending cumulative frequency of a value 𝑏 indicates the frequency of values that are greater than or equal to 𝑏.

The graphs for each type of cumulative frequency typically follow these shapes. And we can use both types of graph to estimate the median of a grouped frequency. To do this on either type of graph, we draw a horizontal line from the median position on the 𝑦-axis until the line meets the curve. Then we draw a vertical line from this point downwards to the 𝑥-axis. And it’s this value on the 𝑥-axis that is the estimate for the median. Because half of the data is above the median and half is below, it doesn’t matter which type of cumulative frequency graph we use to find an estimate for the median.

And if we drew both the ascending and descending cumulative frequency curves for a data set on the same graph, the point at which the curves intersect would be at the median position. We’ll now see an example where we estimate the median using a cumulative frequency graph.

From the following cumulative frequency graph that represents the masses of some balls that have different colors, find an estimate for the median. Option (A) 1.7 kilograms, option (B) 2.1 kilograms, option (C) 2.7 kilograms, option (D) 2.9 kilograms, or option (E) 3.1 kilograms.

Let’s begin by noting that cumulative frequency is a running total of the frequencies. And the median of a set of values is the middle value. Now in this problem, we are considering a number of balls that have different masses. If we had all the masses of the balls, we could order them from lightest to heaviest or heaviest to lightest. Then the median value would be the mass of the ball at the middle position. But even if we did have the original masses of each individual ball, instead of having to list all the masses of them in order, we could use the cumulative frequency graph to help us.

This first reading on the chart with an 𝑥-value of a mass of one kilogram and a 𝑦-value of two for cumulative frequency means that there are two balls with a mass less than one kilogram. And the next value of seven balls at the two-kilogram mark doesn’t mean that seven balls have a mass of two kilograms. It means that seven balls have a mass that is less than two kilograms, and this includes the two balls that have a mass less than one kilogram.

So how many balls are there in total in the problem? Well, by looking at the highest point on the cumulative frequency graph, we can see that 15 balls have a mass less than five kilograms. So there are 15 balls in total. So if we had the 15 balls laid out in order of lightest to heaviest, we need to work out the median position. And the position of the median is at half of the total frequency. Half of 15 is 7.5.

Then to find the median using the graph, we draw a horizontal line from the 𝑦-axis at the median position of 7.5 like this until this line meets the curve. Next, we draw a vertical line downwards from this point to the 𝑥-axis. It is this point on the 𝑥-axis that gives us the estimate for the median.

Reading the axis carefully, we can give the answer that an estimate for the median mass of the balls is 2.1 kilograms, which was the answer given in option (B).

We should be careful not to make a very common mistake when using cumulative frequency graphs to find the median. This comes from incorrectly taking half of the value on the 𝑥-axis. Half of the total possible masses of five kilograms is 2.5 kilograms, but this is not an estimate for the median. Neither should we draw a line upwards from this point to the curve and then read the value from the 𝑦-axis. If we did do this, it would only tell us that approximately 9.7 balls had a mass less than 2.5 kilograms, but it wouldn’t tell us anything about the median. And therefore, we should take care to calculate half of the final cumulative frequency to find the median position and then draw a line from this point to correctly estimate the median.

We’ll now see an example where we estimate the median from a descending cumulative frequency graph.

An employer surveyed 30 employees to determine the distance in kilometers of their commute to work. The data is given in the descending cumulative frequency graph. Determine an estimate for the median commuting distance.

In this question, we are given a descending cumulative frequency graph. The descending cumulative frequency is different to ascending cumulative frequency because we say that the descending cumulative frequency of a value 𝑏 is the frequency of values that are greater than or equal to 𝑏. In an ascending cumulative frequency graph, the values are the less than values. So for example, the coordinate 10, 24 indicates that 24 employees had a commuting distance that was greater than or equal to 10 kilometers.

Now given that we have this descending cumulative frequency, we are asked to find an estimate for the median. And we can recall that the median is the middle value when the data is in ascending or descending order. And we are asked for an estimate for the median because we have a grouped data set. So that means we can’t find the exact median. It doesn’t mean that we should try to guess what it is.

To find the position of the median, we calculate the total frequency divided by two. The total frequency in this problem is the total number of employees. We were told that there were 30 employees surveyed, but even if we weren’t given this information, we could determine it from the graph. The highest value in the descending cumulative frequency is the first value, since 30 employees had a commuting distance greater than or equal to zero kilometers. The median position is therefore 30 over two, which is 15. If we lined up all the employees from the smallest distance commuted to the largest distance commuted, the median distance commuted would belong to the 15th person.

So how do we know what distance they traveled? Well we can use the graph. We draw a horizontal line at the median position, that’s 15, on the descending cumulative frequency until it meets the curve. Then we draw a vertical line downwards to the 𝑥-axis, which gives us 19. Therefore, we can give the answer that an estimate for the median distance commuted is 19 kilometers.

In the next example, we need to estimate the median of a given data set. And we do this by first drawing a cumulative frequency graph.

The cost, in dollars, of cans of soda in different places is recorded in the table below. Determine an estimate for the median cost of soda approximated to the nearest hundredth.

In this problem, we are given the cost of soda as a grouped frequency table. If we consider the first cost column, this has zero dollars and a hyphen. So, the values in this group will be the costs that are greater than or equal to zero dollars but less than 50 cents, because that’s the lower boundary of the next class. And the frequency of one means that one can of soda was in this cost boundary.

Because we don’t know the cost of every individual can of soda, we can only calculate an estimate for the median, which is the middle value when the costs are ordered from least to greatest or greatest to least.

One way in which we can estimate the median is by drawing a cumulative frequency diagram. We can recall that the cumulative frequency, or ascending cumulative frequency, of a value 𝑎 indicates the frequency of values that are less than 𝑎. Now the best way to record the cumulative frequencies is by adding to the table or perhaps even drawing a new table.

So let’s draw a new table. And this time, instead of frequency on the second row, we have cumulative frequency. The first group in the original table is values that are zero dollars or more but less than 50 cents. However, it is common to include a starting cumulative frequency of zero. In this context, we are saying that there were no cans of soda sold for less than zero dollars. The second group in the new table would be values less than 50 cents. And then we can continue to create new group headings for the cost in the same way up until the amount of two dollars.

However, let’s think about what this group represents. The values in this final group are two dollars or more. This group doesn’t have an upper boundary. So we don’t know what values these are less than. However, in grouped frequency distributions like this, we can assume that the class widths are all the same, so in this case 50 cents. We can say that the values in this group would be two dollars or more and less than two dollars and 50 cents. So in the table we are creating, the cumulative frequency of the final group would be values less than two dollars and 50 cents.

Now let’s work out the values for the cumulative frequencies of each class. The first nonzero cumulative frequency comes from the first value in the original table. There was one can of soda that had a cost less than 50 cents. The next cumulative frequency is for costs less than one dollar. We know that there were six cans from 50 cents or more up to one dollar. But this one can costing less than 50 cents is also less than one dollar. So the cumulative frequency is the sum of these, which is seven.

For the next cost of less than one dollar 50, we can do the same. 15 cans cost one dollar or more but less than one dollar 50. So by adding 15 to the previous cumulative frequency of seven, we know that 22 cans cost less than one dollar 50. And we can find the two remaining cumulative frequencies by adding 21 and then seven to give us values of 43 and 50.

Now remember that we’ve done this so that we can plot a cumulative frequency diagram. The coordinates that we plot on the graph will have 𝑥-coordinates of the less than values and the 𝑦-coordinates of their respective cumulative frequencies. We then just need to take care when drawing the grid that we have enough space to include all the values.

So here, we have the points plotted and joined with a smooth curve. And then we can find an estimate for the median. Since the highest value in the cumulative frequency is 50, we know that there were 50 cans of soda. And we can find the median position by dividing the total frequency, that’s the total number of cans, by two. Half of 50 is 25, so that means that the median cost of a can of soda would be the cost of the 25th can. And we can use the graph to determine an estimate for the cost of the 25th can.

We draw a horizontal line from 25 on the 𝑦-axis until it meets the curve and then draw a vertical line downwards from this point to the 𝑥-axis. Reading from the graph, the value for the cost is 1.60. So the answer for the median cost of a can of soda is one dollar and 60 cents.

Let’s now summarize the key points of this video.

We recapped that there are two different types of cumulative frequency. The ascending cumulative frequency, often just called cumulative frequency, of a value indicates the frequency of values that are less than it, whereas the descending cumulative frequency of a value indicates values that are greater than or equal to it. The median of a data set represents the middle value when the values are written in order.

As noted in this video, we can’t calculate an exact median from a grouped frequency distribution. Instead, we determine an estimate for the median. We can determine the median position using either an ascending or descending cumulative frequency diagram as median position equals total frequency over two.

Finally, using either type of cumulative frequency diagram, we draw a horizontal line from the value of the median position on the 𝑦-axis until it meets the curve. And then we draw a vertical line downwards from this point to the 𝑥-axis. This value on the 𝑥-axis is the estimate for the median.

Lesson Video: Grouped Frequency Tables: Estimating the Median Mathematics

Video Transcript

Join Nagwa Classes