In this explainer, we will learn how to construct and analyze data from box-and-whisker plots.
When we have a numerical data set, a good way of showing how the data is spread out from the center is with a box-and-whisker plot. Remember that a numerical data set is one in which the values are measurements, like height, weight, or age. We say that the variable (the thing that we are measuring, e.g., height) is numerical.
It will be helpful to remind ourselves of some useful terms before looking at examples of box-and-whisker plots and how to use them.
Some Useful Terms:
- The minimum of a set of data is the smallest value in the data set.
- The maximum of a set of data is the largest value in the data set.
- The range of a data set is the maximum value minus the minimum value.
- The first quartile, or Q1, is the value in a data set below which 25% of the data lie.
- The median, or second quartile (Q2), of a set of data is the middle value of the data set. So, 50% of the data lie below the median.
- The third quartile, or Q3, is the value in a data set below which 75% of the data lie.
- The interquartile range (IQR) of a data set is given by and represents 50% of the data. That is, 50% of the data lie between Q1 and Q3.
- An outlier is a value that is much smaller or much larger than most of the other values in a data set.
Now, let us look at the different components and features of a box-and-whisker plot.
A box-and-whisker plot (or boxplot) is a graph that illustrates the spread of a set of numerical data, using five numbers from the data set: the maximum, the minimum, the first quartile (Q1), the median, and the third quartile (Q3).
Let us list the features of the boxplot:
- The horizontal axis covers all possible data values.
- The box part of a box-and-whisker plot covers the middle 50% of the values in the data set.
- The whiskers each cover 25% of the data values.
- The lower whisker covers all the data values from the minimum value up to Q1, that is, the lowest 25% of data values.
- The upper whisker covers all the data values between Q3 and the maximum value, that is, the highest 25% of data values.
- The median sits within the box and represents the center of the data. 50% of the data values lie above the median and 50% lie below the median.
- Outliers, or extreme values, in a data set are usually indicated on a box-and-whisker plot by the “star” symbol. If there is one or more outliers in a data set, for the purpose of drawing a box-and-whisker plot, we take the minimum and maximum to be the minimum and maximum values of the data set excluding the outliers.
In our first example, we will draw a box-and-whisker plot for a specific data set and interpret some of the features of this plot.
Example 1: The Components of a Box-and-Whisker Plot
Noah has calculated the following information from a data set about the ages of people present on a Saturday morning in a swimming pool:
- lowest value: 7
- lower quartile: 10
- median: 15
- upper quartile: 22
- highest value: 31
- Draw a box-and-whisker plot using the information Noah has calculated from the data set.
- What is the overall age range of the Saturday morning swimmers?
- What percentage of Saturday morning swimmers were between 7 and 22 years old?
- Calculate and interpret the percentage of Saturday morning swimmers covered by the box.
To draw the box-and-whisker plot, our first step is to draw and label an appropriate horizontal axis.
Since our least, or minimum, value is 7 and the highest, or maximum, value is 31, we can start our axis at 5 and finish at 35. These are round numbers that cover the whole range of our data. We can now mark the values Noah has calculated on our axis.
Now, we can begin to plot our box and whiskers. Let us start by drawing the box. For the left side of the box, we draw a vertical line above Q1. And for the right, we draw a vertical line above Q3. We also include a line above the median.
Using the lines above Q1 and Q3 as the short sides, we can form a rectangle, which is our box.
Note that we are not too concerned with how far above the axis we draw our box, although it should be close enough to the axis that we can read which values the features of the box sit above.
The final step is to draw the whiskers. Marking above where each of the minimum and the maximum values sits on the axis, in line with the center of the short sides of the box, we then join these marks to the box with horizontal lines.
This completes our box-and-whisker plot for the ages of Saturday morning swimmers.
To find the range of the ages of the Saturday morning swimmers, we subtract the minimum value (the lowest age) from the maximum value (the highest age). The highest age was 31 years and the lowest was 7 years, so the range is
That is, the range of the ages of Saturday morning swimmers was 24 years.
To find the percentage of Saturday morning swimmers between 7 and 22 years old, we can use the information provided by Noah and shown in the box-and-whisker plot. We know that the youngest swimmer was 7 and that . We know also that, by definition, 75% of the data lies below Q3, that is, between the minimum data value and Q3.
So, we can say that 75% of the Saturday morning swimmers were between 7 and 22 years old.
To calculate the percentage of Saturday morning swimmers covered by the box, we can again use the information we have from Noah and displayed in the box-and-whisker plot.
We know that Q1, which corresponds to the left-hand side of the box, is 10 and that Q3, corresponding to the right-hand side of the box, is 22. We also know that, by definition, 50% of the data lie between Q1 and Q3. So, since our box stretches from Q1 to Q3, the box must cover 50% of the data.
We can interpret this as follows: 50% of the Saturday morning swimmers were between 10 and 22 years old.
In our next example, we use a box-and-whisker plot to determine percentages of a data set.
Example 2: Percentages from Box-and-Whisker Plots
The test scores for a physics test are displayed in the following box-and-whisker plot. Determine the percent of students who had scores between 85 and 120.
We will use the knowledge we have about box-and-whisker plots to determine the percentage of students who had scores between 85 and 120 in their physics test.
We know that the left-hand edge of the box sits above the first quartile, Q1, of a data set and that the maximum data value sits below the point at the right end of the right-hand whisker.
From our box-and-whisker plot, we can see that Q1 corresponds to a score of 85 and that the maximum score was 120. We also know that 25% of the values in a data set lie below Q1.
If 25% of the data in a data set lie below Q1, then the remainder of the data set must lie above Q1 (i.e., the remaining 75%), that is, that 75% of the data set lies between Q1 and the maximum data value.