In this explainer, we will learn how to draw a cumulative frequency diagram and how to use it to make estimations about the data.
Let’s begin by understanding what cumulative frequency is.
Definition: Cumulative Frequency
Cumulative frequency is the sum of all the previous frequencies up to the current point. It is often referred to as the running total of the frequencies.
The ascending cumulative frequency of a value can be found by adding all the frequencies less than .
The use of cumulative frequencies is a statistical method that is typically applied to grouped frequency tables, where data is organized into smaller groups or classes.
Let’s look at an example of how we find the cumulative frequency of a set of data that is given in a grouped frequency table.
Example 1: Completing a Cumulative Frequency Table from a Grouped Frequency Table
The table shows the number of hours that 100 students spent revising for an exam.
|Number of Hours||Frequency||Cumulative Frequency|
Determine the missing cumulative frequency results.
The grouped frequency table presents the data on the number of hours that students spent studying. The groups, or classes, have open intervals such that the first group, 0–, represents values of 0 hours or greater but less than 2. This is because the next group begins with values greater than or equal to 2. We do not have overlapping values in a grouped frequency table.
We are asked to complete the cumulative frequency table based on the frequencies. The cumulative frequency gives the running total of the frequencies. An ascending cumulative frequency will always represent the frequencies of values that are less than a particular value.
The first group in the frequency table has a cumulative frequency of 0. This is because we can conclude from the frequency table that there were 0 students revising for less than 2 hours.
To find the second cumulative frequency value, we add the frequency of the second group to the previous cumulative frequency. There are 10 students who revised for less than 4 hours. Hence, the second cumulative frequency is .
We now need to determine the cumulative frequency of students who revised for less than 6 hours. The class 4– in the grouped frequency table indicates that 19 students revised 4 hours or more and less than 6 hours. However, the 10 students in the previous group also revised for less than 6 hours. Hence, the cumulative frequency for less than 6 hours is equal to . This third cumulative frequency was found by adding the frequency of the third class to the previous cumulative frequency.
We can then continue this process to find each of the cumulative frequency values.
It is worth noting that the cumulative frequency of all values will be the same as the total frequency. This is useful for checking whether our values are correct. The total frequency can be calculated as
Since the final cumulative frequency is also 100, then we have confirmed that the missing cumulative frequency values are
We will now see the most common way in which cumulative frequency is presented: as a cumulative frequency graph, sometimes called an ogive.
Definition: Cumulative Frequency Graph
A cumulative frequency graph displays the cumulative frequency of a data set. This can be a cumulative frequency polygon, where straight lines join the points, or a cumulative frequency curve.
The cumulative frequency for a value is the total number of data values that are less than .
Since cumulative frequency is a running total of values, the graph of the cumulative frequency will never descend. It may have horizontal portions where the cumulative frequency remains the same if the frequency of a group is 0.
We will now see some examples of cumulative frequency graphs. In the next example, we will identify the correct representation of a data set as a cumulative frequency graph.
Example 2: Identifying a Cumulative Frequency Graph for a Data Set
A manufacturer samples the mass, in grams, of 30 pencils from the production line. Their masses are recorded in the table. No pencil has a mass greater than 60 g.
Which cumulative frequency graph correctly shows this information?
In order to identify the correct graph, we need to calculate the cumulative frequencies for the values in the table. This will give us a running total for values that are less than a given point. The “less than” value that we use will be the upper boundary of each class.
We begin by recognizing that the first group in the table represents masses that are 10 g or greater but less than 20 g. Creating a new table, we can add a class that has a frequency of 0 to represent that there were 0 pencils less than 10 g. The next group, 10–, includes masses that are 10 g or more and less than 20 g. We can continue until the final grouping of 50–. Although this has a theoretically open interval, we assume that the final group has the same class size as the other classes. Therefore, this group can be considered to include masses that are less than 60 g.
We can start filling in the cumulative frequency totals with the first cumulative frequency of 0, since there are 0 pencils recorded with a mass less than 10 g. Next, we know that there are 3 pencils with a mass less than 20 g and this will be the second cumulative frequency.
We are given that 6 pencils have a mass of 10 g or more and less than 20 g. However, the 3 pencils in the previous class are also still less than 20 g and so are included in the cumulative frequency. The cumulative frequency is .
We can then continue adding the frequency of each group to the cumulative frequency.
The final cumulative frequency will be the same as the total frequency. In this case, this will be the value 30, since 30 pencils were sampled.
When drawing or identifying the graph of a cumulative frequency in this context, we will have mass on the and cumulative frequency on the . The -coordinate values will be the “less than” mass values. This allows us to use a cumulative frequency curve to identify values that are less than any particular value. The coordinates that would be plotted can be given as , , , , , and .
The graph that matches these coordinates is that of graph B, and so this is the cumulative frequency graph for the given information.
Graph A has a cumulative frequency, but these values are plotted at the beginning of each class interval. We could not use it to find a correct estimate for the cumulative frequency at any given point. Graph C is that of a frequency polygon, and it is not a cumulative frequency graph.
In the previous example, we had a choice of cumulative frequency graphs from which to select the correctly drawn solution. However, in the next example, we will see how we can draw our own cumulative frequency graph. As with any graph-drawing problem, it is important to take time to ensure that we have allowed space on our axes to cover the full range of values. A common mistake when creating cumulative frequency diagrams is to only cover values on the up to the highest frequency value in the frequency table. Remember that the must extend to the highest cumulative frequency value, the total frequency.
Example 3: Drawing a Cumulative Frequency Graph for a Data Set
The following table shows the heights of students in a high school.
Represent this data in a cumulative frequency graph.
The cumulative frequency of a data set is a running total of the frequencies. It gives us the number of values that are less than any particular value.
Let’s begin by finding the cumulative frequency for each group, or class, in the grouped frequency table. To begin the cumulative frequency, we create a class with a frequency of 0. This will be the number of students with heights that are less than 150 cm (the first group in the table). The class 150– indicates heights that are 150 cm or more but less than 155 cm, since the next group begins with values of 155 cm or greater. We can continue to create new group descriptions in the following table. We assume that the last group has the same class width as the other classes, and as such, the final group in the cumulative frequency will be heights less than 175 cm.
We now calculate the cumulative frequency values, beginning with a cumulative frequency of 0 in the first group. The first frequency in the given frequency table is 4, so there are 4 students with a height of less than 155 cm.
Next, we add the 22 students with a height greater than 155 cm but less than 160 cm. The cumulative frequency of the group “less than 160” is found using . We continue adding the frequencies as shown.
To plot this information as a graph, we plot height on the and the cumulative frequency on the . Each ordered pair will have the -coordinate as the upper boundary of each group (the “less than” value) and a -coordinate of the corresponding cumulative frequency.
We join the points with a smooth curve and create the following cumulative frequency graph.
When creating a cumulative frequency diagram, it is preferable to join the points with a smooth curve, rather than with straight lines. This gives us a better approximation for the data and allows us to make more accurate estimations for cumulative frequencies that do not lie on boundaries of classes (i.e., exact coordinates).
We will now see an example of this, where we are given a cumulative frequency graph and we use it to help us estimate values that are less than, greater than, or equal to particular values.
Example 4: Interpreting a Cumulative Frequency Graph
Maged took a sample of 100 balls from box A. He weighed each ball and recorded its weight in the table.
He used the data to draw the cumulative frequency graph shown on the grid.
- Estimate how many balls had a weight of less than 80 grams.
- Estimate how many balls had a weight of 130 grams or more.
Cumulative frequency is the sum of all the previous frequencies up to the current point. It is often referred to as the running total of frequencies. The given graph shows the cumulative frequency of the weights of 100 balls. We can see from the graph that the highest cumulative frequency is 100. Any point on the cumulative frequency graph indicates the total number of balls that are less than that given weight.
In order to find an estimate for the number of balls that are less than 80 grams, we can draw a vertical line from 80 on the until it meets the curve. We then draw a horizontal line from this point to the to allow us to the read the corresponding -value, the cumulative frequency.
Observing that each minor grid line on the represents a frequency of 2, we can give the answer to the first part of this question. The number of balls less than 80 grams can be estimated as 26 balls.
Although each value on the cumulative frequency curve represents frequencies that are less than a particular value, we can still use the curve to find the values for “greater than or equal to” values. To estimate the number of balls that are 130 grams or more, we use the same process. We draw a vertical line from 130 on the to the curve, and then draw a horizontal line from this point to the .
We can read the cumulative frequency of 78 balls from the , which means that 78 balls had a weight less than 130 grams. In order to find the number of balls that had a weight of 130 grams or more, we subtract this from the total frequency. The total frequency is the total number of balls that have been weighed; hence, it is 100. Thus, we have
The answer for the second part of the question is that we estimate that there are 22 balls with a weight of 130 grams or more.
In a grouped frequency table, the groups or classes may be described using different notation. We have seen how a class of 10– represents values that are 10 or greater and less than the lower boundary of the subsequent class.
We can also use inequalities to represent the boundaries in continuous data sets. For example, data representing lengths, , may be allocated different intervals written as . In this notation, we can clearly see how the lengths, , in this grouping are greater than or equal to 10 (since is equivalent to ) and less than 20.
In the final example, this inequality notation is used in the grouped frequency table from which we create a cumulative frequency graph and use this to answer a question.
Example 5: Drawing and then Interpreting a Cumulative Frequency Graph
A botanist records the height , in centimetres of 120 tomato plants in an experiment testing different growing conditions. By drawing a cumulative frequency curve to represent the data, estimate the number of plants with a height less than 115 cm.
We recall that cumulative frequency is the sum of all the previous frequencies up to the current point, often described as the running total of the frequencies.
We can use the upper boundaries of each group in the table and identify the cumulative frequency of heights that are less than each of these upper boundaries. For example, the first group represents heights that are less than 110 cm, the second group has heights of less than 120 cm, and so forth.
We can then create the running total of frequencies, where the first cumulative frequency value is the same as the frequency. Subsequent cumulative frequency values are found by adding the frequency of a class to the previous cumulative frequency total. We determine that the cumulative frequencies are 7, 32, 82, 115, and 120.
When plotting a cumulative frequency curve, we have cumulative frequency on the and the other variable on the . The must extend to the highest cumulative frequency value.
We use the upper boundary of each class as the -coordinate. Therefore, here, we will be plotting the coordinates , , , , and . It is also common to include a group with a frequency of 0, to allow us to plot a cumulative frequency of 0. Since there are no plants recorded with a height less than 100 cm, then we can also plot the coordinate . Joining the points with a smooth curve, we produce the following graph.
We use the graph to help us estimate the number of plants with a height less than 115 cm. Any coordinate on the cumulative frequency curve will give us the cumulative frequency of heights that are less than the -coordinate value. We draw a vertical line from 115 cm on the to the curve, and then draw a horizontal line from this point to the .
This gives us the cumulative frequency of 16. As this may vary slightly on the drawing of our curves, values approximate to this would also be acceptable estimates. We cannot use a cumulative frequency curve to give an exact value.
We can give the answer that an estimate for the number of tomato plants with a height less than 115 cm is 16 plants.
We will now summarize the key points.
- Cumulative frequency is the sum of all the previous frequencies up to the current point. It is often referred to as the running total of the frequencies.
- To draw a cumulative frequency graph, we first determine all the cumulative frequency totals for values that are less than the upper boundary of each class.
- To plot the coordinates for each cumulative frequency value, we take the upper boundary of a class (the “less than” value) as the -coordinate and the corresponding cumulative frequency as the -coordinate.
- Any point on a cumulative frequency curve represents the cumulative frequency of variables that are less than the corresponding -coordinate.
- To find the frequency of values that are greater than or equal to any -coordinate, we subtract the value of the -coordinate from the total frequency (the highest cumulative frequency).