In this explainer, we will learn how to construct and analyze data from box-and-whisker plots.

When we have a numerical data set, a good way of showing how the data is spread out from the center is with a box-and-whisker plot. Remember that a numerical data set is one in which the values are measurements, like height, weight, or age. We say that the variable (the thing that we are measuring, e.g., height) is numerical.

It will be helpful to remind ourselves of some useful terms before looking at examples of box-and-whisker plots and how to use them.

### Some Useful Terms:

- The
**minimum**of a set of data is the smallest value in the data set. - The
**maximum**of a set of data is the largest value in the data set. - The
**range**of a data set is the maximum value minus the minimum value. - The
**first quartile**, or**Q1**, is the value in a data set below which 25% of the data lie. - The
**median**, or second quartile (Q2), of a set of data is the middle value of the data set. So, 50% of the data lie below the median. - The
**third quartile**, or**Q3**, is the value in a data set below which 75% of the data lie. - The
**interquartile range (IQR)**of a data set is given by and represents 50% of the data. That is, 50% of the data lie between Q1 and Q3. - An
**outlier**is a value that is much smaller or much larger than most of the other values in a data set.

Now, let us look at the different components and features of a box-and-whisker plot.

### Definition

A box-and-whisker plot (or boxplot) is a graph that illustrates the spread of a set of numerical data, using five numbers from the data set: the maximum, the minimum, the first quartile (Q1), the median, and the third quartile (Q3).

Let us list the features of the boxplot:

- The horizontal axis covers all possible data values.
- The box part of a box-and-whisker plot covers the middle 50% of the values in the data set.
- The whiskers each cover 25% of the data values.
- The lower whisker covers all the data values from the minimum value up to Q1, that is, the lowest 25% of data values.
- The upper whisker covers all the data values between Q3 and the maximum value, that is, the highest 25% of data values.

- The median sits within the box and represents the center of the data. 50% of the data values lie above the median and 50% lie below the median.
- Outliers, or extreme values, in a data set are usually indicated on a box-and-whisker plot by the “star” symbol. If there is one or more outliers in a data set, for the purpose of drawing a box-and-whisker plot, we take the minimum and maximum to be the minimum and maximum values of the data set excluding the outliers.

In our first example, we will draw a box-and-whisker plot for a specific data set and interpret some of the features of this plot.

### Example 1: The Components of a Box-and-Whisker Plot

Adam has calculated the following information from a data set about the ages of people present on a Saturday morning in a swimming pool:

- lowest value: 7
- lower quartile: 10
- median: 15
- upper quartile: 22
- highest value: 31

- Draw a box-and-whisker plot using the information Adam has calculated from the data set.
- What is the overall age range of the Saturday morning swimmers?
- What percentage of Saturday morning swimmers were between 7 and 22 years old?
- Calculate and interpret the percentage of Saturday morning swimmers covered by the box.

### Answer

**Part 1**

To draw the box-and-whisker plot, our first step is to draw and label an appropriate horizontal axis.

Since our least, or minimum, value is 7 and the highest, or maximum, value is 31, we can start our axis at 5 and finish at 35. These are round numbers that cover the whole range of our data. We can now mark the values Adam has calculated on our axis.

Now, we can begin to plot our box and whiskers. Let us start by drawing the box. For the left side of the box, we draw a vertical line above Q1. And for the right, we draw a vertical line above Q3. We also include a line above the median.

Using the lines above Q1 and Q3 as the short sides, we can form a rectangle, which is our box.

Note that we are not too concerned with how far above the axis we draw our box, although it should be close enough to the axis that we can read which values the features of the box sit above.

The final step is to draw the whiskers. Marking above where each of the minimum and the maximum values sits on the axis, in line with the center of the short sides of the box, we then join these marks to the box with horizontal lines.

This completes our box-and-whisker plot for the ages of Saturday morning swimmers.

**Part 2**

To find the range of the ages of the Saturday morning swimmers, we subtract the minimum value (the lowest age) from the maximum value (the highest age). The highest age was 31 years and the lowest was 7 years, so the range is

That is, the range of the ages of Saturday morning swimmers was 24 years.

**Part 3**

To find the percentage of Saturday morning swimmers between 7 and 22 years old, we can use the information provided by Adam and shown in the box-and-whisker plot. We know that the youngest swimmer was 7 and that . We know also that, by definition, 75% of the data lies below Q3, that is, between the minimum data value and Q3.

So, we can say that 75% of the Saturday morning swimmers were between 7 and 22 years old.

**Part 4**

To calculate the percentage of Saturday morning swimmers covered by the box, we can again use the information we have from Adam and displayed in the box-and-whisker plot.

We know that Q1, which corresponds to the left-hand side of the box, is 10 and that Q3, corresponding to the right-hand side of the box, is 22. We also know that, by definition, 50% of the data lie between Q1 and Q3. So, since our box stretches from Q1 to Q3, the box must cover 50% of the data.

We can interpret this as follows: 50% of the Saturday morning swimmers were between 10 and 22 years old.

In our next example, we use a box-and-whisker plot to determine percentages of a data set.

### Example 2: Percentages from Box-and-Whisker Plots

The test scores for a physics test are displayed in the following box-and-whisker plot. Determine the percent of students who had scores between 85 and 120.

### Answer

We will use the knowledge we have about box-and-whisker plots to determine the percentage of students who had scores between 85 and 120 in their physics test.

We know that the left-hand edge of the box sits above the first quartile, Q1, of a data set and that the maximum data value sits below the point at the right end of the right-hand whisker.

From our box-and-whisker plot, we can see that Q1 corresponds to a score of 85 and that the maximum score was 120. We also know that 25% of the values in a data set lie below Q1.

If 25% of the data in a data set lie below Q1, then the remainder of the data set must lie above Q1 (i.e., the remaining 75%), that is, that 75% of the data set lies between Q1 and the maximum data value.

In our case, since Q1 corresponds to a score of 85, and the maximum score was 120, we can say that 75 percent of the students had scores between 85 and 120 in their physics test.

### Example 3: Percentages from a Box-and-Whisker Plot

Is half of the data in the interval 36 to 56?

### Answer

Since the data value 36 sits below the end of the lower whisker in the box-and-whisker plot, we can say that the minimum data value is 36. Similarly, since the data value 56 sits below the vertical bar within the box, we can say that the median of the data is 56. We know that the median is the middle value of the data set, splitting the data set in two. This means that 50% of the data lie below the median.

We can therefore answer as follows: yes, half of the data is in the interval 36 to 56.

It is worth noting that in this example we have an outlier in the data set (a value of 80), which is much higher than the majority of the data values.

Technically, this is the maximum value in the data set, but if we had been asked if half of the data is in the interval 56 to 64, we would answer as follows: yes, excluding the outlier at 80, half of the data is in the interval 56 to 64.

### Example 4: Interpreting a Box-and-Whisker Plot

Look at the box-and-whisker plot. Give a reason why the line inside the box is further to the right.

- The median is closer to Q3 than Q1.
- The mean is about 49.
- The person who made the graph made a mistake.
- The mode is 49.

### Answer

To determine which of the reasons given is correct, for why the line inside the box is further to the right, let us mark the quantities that we know on our box-and-whisker plot.

We can now address each possibility in turn.

- We can see from our plot that the line above the median within the box is closer to
the right-hand edge than the left-hand edge of the box. And we know that the right-hand
edge of the box sits above Q3, whereas the left-hand edge sits above Q1. So
“
**A**” must be correct: the median is closer to Q3 than to Q1. - It is not possible to tell what the mean of a data set is from a box-and-whisker plot. We can only tell what the median is, and in this case it looks as though the median is approximately 49.
- We cannot tell whether the person who made the graph messed up or not, but it is perfectly possible for the line above the median in a box-and-whisker plot to be off-center within the box.
- A box-and-whisker plot can only tell us the value of the median, not the mode of a data set. So we cannot say that the mode is 49.

We can conclude that option “**A**” is correct.

In our final example, we will see how to gain and interpret information from a box-and-whisker plot.

### Example 5: Gaining and Interpreting Information from a Box-and-Whisker Plot

The boxplot shows the daily temperatures at a seaside resort during the month of August.

- What was the median temperature?
- What was the maximum temperature recorded?
- What was the minimum temperature recorded?
- What was the lower quartile of the temperatures?
- What was the upper quartile of the temperatures?
- What was the interquartile range of the temperatures?
- What was the range of the temperatures?
- On roughly what percentage of days was the temperature between and ?
- On roughly what percentage of days was the temperature greater than ?

### Answer

In order to answer the questions **(A)–(I)**, let us mark the quantities we
know on our box-and-whisker plot.

- In a box-and-whisker plot, the median of the data set is the value that sits below the vertical line inside the box. In our case, this value is 24. So, the median temperature was .
- In a box-and-whisker plot, the maximum value in the data set sits below the right-hand end of the right-hand (or upper) whisker. As we can see from our plot, the maximum value here is 31. So the maximum temperature recorded was .
- In a box-and-whisker plot, the minimum value in the data set sits below the left-hand end of the left-hand (or lower) whisker. As we can see from our plot, the minimum value here is 18. So the minimum temperature recorded was .
- In a box-and-whisker plot, the lower quartile (Q1) of a data set is the value that sits below the left-hand edge of the box. In our case, this value is 22. So, the lower quartile of the temperatures was .
- In a box-and-whisker plot, the upper quartile (Q3) of a data set is the value that sits below the right-hand edge of the box. In our case, this value is 28. So the upper quartile of the temperatures was .
- The interquartile range (IQR) of a data set is the distance between the lower and upper quartiles, which is given by . In this case, So, the interquartile range of the temperatures was .
- The range of a data set is given by the maximum value minus the minimum value. In our case, we can see from our box-and-whisker plot that the maximum value in the data set is 31 and the minimum value is 18. So the range is The range of temperatures was therefore .
- To find what percentage of days
the temperature was between
and
, we note again
that the lower quartile, Q1, is
and that the median
is . We know also that
50% of the data lie below the median and that 25% of the data lie below Q1.

So, between Q1 and the median, there must be of the data. Therefore, on 25% of the days the temperature was between (which is Q1) and (which is the median). - To find on what percentage of days the temperature was greater than , we note again that the median was and that 50% of the data lie above the median.

So, the temperature was greater than on 50% of the days.

Let us conclude by reminding ourselves of the main features of a box-and-whisker plot.

### Key Points

- The box part of a box-and-whisker plot covers the middle 50% of the values in the data set.
- The whiskers each cover 25% of the data values.
- The lower whisker covers all the data values from the minimum value up to Q1, that is, the lowest 25% of data values.
- The upper whisker covers all the data values between Q3 and the maximum value, that is, the highest 25% of data values.

- The median sits within the box and represents the center of the data. 50% of the data values lie above the median and 50% lie below the median.
- Outliers, or extreme values, in a data set are usually indicated on a box-and-whisker plot by the “star” symbol.