In this explainer, we will learn how to use histograms to analyze data, communicate information, and get insights from data.
One of the ways we gain information from a data set is by displaying the data in a graph or plot. A histogram (sometimes called a frequency graph) is a type of graph used to display data.
A histogram looks quite similar to a bar chart, but while a bar chart is used for data that has been counted for a set of categories, for example, colors, a histogram is used to display numerical data. Remember that a numerical data set is one in which the values are measurements, like height, weight, or age. We say that the variable (the thing that varies, e.g., height) is numerical. In a histogram, the variable, which is usually on the horizontal axis, is split into ranges of measurements.
The bar chart at the top shows the number of students who prefer each different fruit. The variable is “fruit” and there are six categories, or types, of fruit. The height of the bar is the number of students preferring that fruit. In the histogram at the bottom, the height of each bar is the frequency for that height range.
Features of a Histogram
- Histograms are graphs used to represent numerical data.
- Normally, the horizontal axis represents the possible values of the variable.
- The values on the horizontal axis are then split into groups or ranges. The width of a bar is called the class interval.
- The vertical axis represents the frequency, and the area of each bar is proportional to the frequency for that range. (Sometimes, we use “number of” instead of “frequency” but only when the bars are of equal width.)
- The total frequency (or number of values in the data set) is found by summing the heights of all the bars.
The example below shows how to read frequency from a histogram.
Example 1: Reading Frequencies from a Histogram
Adam recorded how long he takes to get dressed in the morning over a number of days. The results are shown on the diagram. Which time interval has the highest frequency?
Answer
In a histogram, the interval with the highest frequency is the interval with the highest bar above it.
In this case, the highest bar corresponds to the interval of time between 5 and 10 minutes, . The frequency corresponds to the number of days, so we can see that it took Adam between 5 and 10 minutes to get dressed on 40 days. So, answer “A ” is correct.
You will notice that in answer “A” the interval does not include “exactly 5 minutes” but does include “exactly 10 minutes.” If we look at the remaining answers, we see that the same thing applies to them too. For all the intervals, the lowest possible value in the interval definition is not included in that interval but the highest possible value is included. Defining the intervals in this way means that all possible values are included in the whole set of intervals but that there is no overlap between them.
Note also that the interval specified in answer B () covers two intervals on the histogram: and .
In our next two examples, we will read off and calculate frequencies from a histogram.
Example 2: Frequencies from a Histogram
The histogram shows the age at which several people started their first job. Using this information, determine how many people started their first job between the ages of 15 and 25.
Answer
We can see from the histogram that there are two bars above the horizontal axis, between the ages of 15 and 25. One of the bars covers the ages 15 to 20 and the second covers those people who started their first job between the ages of 20 and 25. The number of people between 15 and 25 is then found by adding together the heights of these two bars.
Reading the frequencies from the vertical axis, we can see that 9 people started their first job between 15 and 20 and that 10 people started their first job between 20 and 25. Adding these together, we find that people started their first job between the ages of 15 and 25.
Example 3: Counting Frequencies from Histograms
This histogram shows the time that students spend doing homework every night. How many students spend seventy-five minutes or less doing homework?
Answer
To work out how many students spend seventy-five minutes or less doing homework, we add the frequencies for each of the intervals where the time spent is seventy-five minutes or less.
The frequencies are the heights of the bars, so from the histogram let us read off the heights of the bars for the intervals with times of seventy-five minutes or less. We start with the lowest time interval, , where is the time spent (in minutes).
There is only one student who spends between 0 and 15 minutes doing homework as the height of the first bar is 1. Now, we move to the next interval, , reading from the top of the bar to the vertical axis again.
There are 2 students who spend between 15 and 30 minutes on their homework. We can continue in this way for all the bars with intervals up to 75 minutes.
Let us list the number of students in each of the time intervals of seventy-five minutes or less and add them together.
Time Spent (Minutes) | Number of Students |
---|---|
1 | |
2 | |
5 | |
10 | |
4 | |
Total | 22 |
Since there were 1 student who spends between 0 and 15 minutes, 2 who spend between 15 and 30 minutes, 5 who spend between 30 and 45 minutes, 10 who spend between 45 and 60 minutes, and 4 who spend between 60 and 75 minutes, there are a total of 22 students who spend 75 minutes or less on their homework every night.
In this next example, we will look further at how to gain information from a histogram.
Example 4: Gaining Information from Histograms
The times taken by students to complete their end-of-year math test are shown in the histogram below.
- In which time interval did the highest number of students complete their math test?
- How many students took between 60 and 70 minutes to complete their test?
- How many students took more than 80 minutes to complete their test?
- Find the difference in number of students between the fastest and slowest time intervals.
- How many students took the test?
Answer
Part 1
The time interval in which the highest number of students completed their test is the interval with the tallest bar above it.
This is the interval 80 to 90 minutes, with a frequency of 25. Therefore, the time interval in which the highest number of students completed their test was 80 to 90 minutes.
Part 2
To find how many students took between 60 and 70 minutes to complete their test, we read the height of the bar above the interval 60 to 70 minutes from our histogram.
Reading across from the top of the bar to the vertical axis, we can see that the height of this bar is 15. This means that 15 students took between 60 and 70 minutes to complete their test.
Part 3
To find how many students took more than 80 minutes to complete their test, we add the frequencies for each of the bars above the intervals with times of 80 minutes or more. There are two intervals to consider: 80 to 90 and 90 to 100.
The bar above the interval 80 to 90 minutes has a height of 25, so 25 students took between 80 and 90 minutes (as we found in part(1)). The bar above the interval 90 to 100 has a height of 7, so 7 students took between 90 and 100 minutes. Adding these frequencies together, we find that students took more than 80 minutes to complete their test.
Part 4
To find the difference in number of students between the fastest and slowest time intervals, we need first to find the frequencies for the fastest and slowest intervals, that is, the heights of the bars above the intervals with the lowest and the highest number of minutes.
In fact, we have already found the frequency, that is, the number of students who completed the test in the slowest time interval. The slowest time interval is the interval with the highest number of minutes, which is the interval 90 to 100 minutes. And we know that 7 students completed the test in this time interval.
Next, we need the number of students who took the test in the fastest time interval, that is, the height of the bar above the interval with the lowest completion times. This is the interval 50 to 60 minutes.
There are 8 students who completed their test in this time interval. If we now subtract from this the number of students in the slowest time interval, we get the difference between the two: Hence, the difference in number of students between the fastest and slowest time intervals is 1 student.
Part 5
To find how many students took the test, we add up the heights of all the bars because this gives us the total frequency, which is the total number of students whose times were recorded.
Starting at the left on our histogram, 8 students took between 50 and 60 minutes, 15 students took between 60 and 70 minutes, 17 students took between 70 and 80 minutes, 25 students took between 80 and 90 minutes, and 7 students took between 90 and 100 minutes to complete the test. This gives us
So, in total, 72 students took the test.
Note
In specifying our intervals, for example, 50 to 60 minutes, we actually mean the interval to include 50 minutes and everything from 50 to less than 60 minutes. Our next interval is then 60 minutes to less than 70 minutes, and so on. In this way, we include every possible time from 50 and up to but not including 100 minutes, and there is no overlap between the intervals.
In our final example, we will use the information from a histogram to work out a percentage.
Example 5: Determining Percentages from Histograms
The histogram shows the time it took for several runners to finish a 400-meter dash. Determine the percentage of runners who finished in less than 70 seconds.
Answer
To find the percentage of runners who took less than 70 seconds, we need to know two things:
- The total number of runners,
- The number of runners who took less than 70 seconds to finish.
We can find both quantities by looking at our histogram. The height of each bar represents the number of runners with times in the interval on the horizontal axis below the bar.
As we can see, 3 runners took between 60 and 65 seconds, 7 runners took between 65 and 70 seconds, and so on. Adding up all the bar heights gives us the total number of runners, which was 40. For a runner to take less than 70 seconds, they must be in one of the first two intervals ( or ). And there are of these. Since our total number of runners is 40, we can now work out what percentage of 40 our 10 (slowest) runners represent:
Hence, we conclude that 25% of the runners took less than 70 seconds to run 400 metres.
Let us conclude with a summary of the main features of a histogram.
Key Points
- Histograms are graphs used to represent numerical data.
- The horizontal axis represents the possible values of the variable.
- The values on the horizontal axis are split into groups or ranges. The width of a bar is called the class interval.
- The vertical axis represents the frequency, and the area of each bar is proportional to the frequency for that range.
- The total frequency (or number of values in the data set) is found by summing the heights of all the bars.
We can use histograms to gain information about a data set in a few different ways.
- As the height of a bar represents the frequency for the range of values associated with it, we can read off the frequencies for each range of values. We can then find, for example, the frequency for a group of ranges of values by summing the relevant frequencies for all the appropriate bars.
- To find percentages or proportions, we first work out the total frequency by summing the heights of all the bars. Then, we divide the height of a particular bar by the total to find the proportion of the total in that specific range. Multiplying by 100% gives us the percentage of data points in that range.
Note
There is no overlap between the intervals on the horizontal axis. This means that each data item falls within a unique interval. So, for example, if a bar specifies time () in minutes, we might specify and the next interval as . So a time of exactly 5 minutes falls into the second interval, not the first.