In this explainer, we will learn how to construct and analyze data from box-and-whisker plots.
When we have a numerical data set, a good way of showing how the data is spread out from the
center is with a box-and-whisker plot. Remember that a numerical data set is one in which the
values are measurements, like height, weight, or age. We say that the variable (the thing that
we are measuring, e.g., height) is numerical.
It will be helpful to remind ourselves of some useful terms before looking at examples of
box-and-whisker plots and how to use them.
Some Useful Terms:
The minimum of a set of data is the smallest value in the data set.
The maximum of a set of data is the largest value in the data set.
The range of a data set is the maximum value minus the minimum value.
The first quartile, or Q1, is the value in a data set below which 25% of
the data lie.
The median, or second quartile (Q2), of a set of data is the middle value of the
data set. So, 50% of the data lie below the median.
The third quartile, or Q3, is the value in a data set below which 75% of
the data lie.
The interquartile range (IQR) of a data set is given by and represents 50% of the data. That is, 50% of the data lie
between Q1 and Q3.
An outlier is a value that is much smaller or much larger than most of the other
values in a data set.
Now, let us look at the different components and features of a box-and-whisker plot.
Definition
A box-and-whisker plot (or boxplot) is a graph that illustrates the spread of a set of
numerical data, using five numbers from the data set: the maximum, the minimum, the first
quartile (Q1), the median, and the third quartile (Q3).
Let us list the features of the boxplot:
The horizontal axis covers all possible data values.
The box part of a box-and-whisker plot covers the middle 50% of the values in the data
set.
The whiskers each cover 25% of the data values.
The lower whisker covers all the data values from the minimum value up to Q1, that
is, the lowest 25% of data values.
The upper whisker covers all the data values between Q3 and the maximum value, that
is, the highest 25% of data values.
The median sits within the box and represents the center of the data. 50% of the data
values lie above the median and 50% lie below the median.
Outliers, or extreme values, in a data set are usually indicated on a box-and-whisker
plot by the “star” symbol. If there is one or more outliers in a data set,
for the purpose of drawing a box-and-whisker plot, we take the minimum and maximum to be
the minimum and maximum values of the data set excluding the outliers.
In our first example, we will draw a box-and-whisker plot for a specific data set and
interpret some of the features of this plot.
Example 1: The Components of a Box-and-Whisker Plot
Adam has calculated the following information from a data set about the ages
of people present on a Saturday morning in a
swimming pool:
lowest value: 7
lower quartile: 10
median: 15
upper quartile: 22
highest value: 31
Draw a box-and-whisker plot using the information Adam has calculated
from the data set.
What is the overall age range of the Saturday morning swimmers?
What percentage of Saturday morning swimmers
were between 7 and 22 years old?
Calculate and interpret the percentage of Saturday morning swimmers covered by the box.
Answer
Part 1
To draw the box-and-whisker plot, our first step is to draw and label an appropriate
horizontal axis.
Since our least, or minimum, value is 7 and the highest, or maximum, value is 31, we can
start our axis at 5 and finish at 35. These are round numbers that cover the whole range
of our data. We can now mark the values Adam has calculated on our axis.
Now, we can begin to plot our box and whiskers. Let us start by drawing the box. For the
left side of the box, we draw a vertical line above Q1. And for the right, we draw a
vertical line above Q3. We also include a line above the median.
Using the lines above Q1 and Q3 as the short sides, we can form a rectangle, which is our
box.
Note that we are not too concerned with how far above the axis we draw our box, although
it should be close enough to the axis that we can read which values the features of the
box sit above.
The final step is to draw the whiskers. Marking above where each of the minimum and the
maximum values sits on the axis, in line with the center of the short sides of the box, we
then join these marks to the box with horizontal lines.
This completes our box-and-whisker plot for the ages of Saturday morning swimmers.
Part 2
To find the range of the ages of the Saturday
morning swimmers, we subtract the minimum value (the lowest age) from the maximum value
(the highest age). The highest age was 31 years and the lowest was 7 years, so the range is
That is, the range of the ages of Saturday morning
swimmers was 24 years.
Part 3
To find the percentage of Saturday morning swimmers
between 7 and 22 years old, we
can use the information provided by Adam and shown in the box-and-whisker plot. We know that the youngest swimmer was 7 and that . We
know also that, by definition, 75% of the data lies below Q3, that is, between the minimum
data value and Q3.
So, we can say that 75% of the Saturday morning
swimmers were between 7 and 22 years old.
Part 4
To calculate the percentage of Saturday morning
swimmers covered by the box, we can again use the information we have from Adam
and displayed in the box-and-whisker plot.
We know that Q1, which corresponds to the left-hand side of the box, is 10 and that Q3,
corresponding to the right-hand side of the box, is 22. We also know that, by definition,
50% of the data lie between Q1 and Q3. So, since our box stretches from Q1 to Q3, the box
must cover 50% of the data.
We can interpret this as follows: 50% of the Saturday morning swimmers were between 10 and 22 years old.
In our next example, we use a box-and-whisker plot to determine percentages of a data
set.
Example 2: Percentages from Box-and-Whisker Plots
The test scores for a physics test are displayed in the following box-and-whisker
plot. Determine the percent of students who had scores between 85 and 120.
Answer
We will use the knowledge we have about box-and-whisker plots to determine the percentage
of students who had scores between 85 and 120 in their physics test.
We know that the left-hand edge of the box sits above the first quartile, Q1, of a data
set and that the maximum data value sits below the point at the right end of the
right-hand whisker.
From our box-and-whisker plot, we can see that Q1 corresponds to a score of 85 and that
the maximum score was 120. We also know that 25% of the values in a data set lie below
Q1.
If 25% of the data in a data set lie below Q1, then the remainder of the data set must
lie above Q1 (i.e., the remaining 75%), that is, that 75% of the data set lies between Q1
and the maximum data value.
In our case, since Q1 corresponds to a score of 85, and the maximum score was 120, we can
say that 75 percent of the students had scores between 85 and 120 in their physics
test.
Example 3: Percentages from a Box-and-Whisker Plot
Is half of the data in the interval 36 to 56?
Answer
Since the data value 36 sits below the end of the lower whisker in the box-and-whisker
plot, we can say that the minimum data value is 36. Similarly, since the data value 56
sits below the vertical bar within the box, we can say that the median of the data is 56. We know that the median is the middle value of the data set, splitting the data set in
two. This means that 50% of the data lie below the median.
We can therefore answer as follows: yes, half of the data is in the interval 36 to
56.
It is worth noting that in this example we have an outlier in the data set (a value of
80), which is much higher than the majority of the data values.
Technically, this is the maximum value in the data set, but if we had been asked if half
of the data is in the interval 56 to 64, we would answer as follows: yes, excluding the
outlier at 80, half of the data is in the interval 56 to 64.
Example 4: Interpreting a Box-and-Whisker Plot
Look at the box-and-whisker plot. Give a reason why the line inside the box is
further to the right.
The median is closer to Q3 than Q1.
The mean is about 49.
The person who made the graph made a mistake.
The mode is 49.
Answer
To determine which of the reasons given is correct, for why the line inside the box is
further to the right, let us mark the quantities that we know on our box-and-whisker
plot.
We can now address each possibility in turn.
We can see from our plot that the line above the median within the box is closer to
the right-hand edge than the left-hand edge of the box. And we know that the right-hand
edge of the box sits above Q3, whereas the left-hand edge sits above Q1. So
“A” must be correct: the median is closer to Q3 than to Q1.
It is not possible to tell what the mean of a data set is from a box-and-whisker plot. We can only tell what the median is, and in this case it looks as though the median is
approximately 49.
We cannot tell whether the person who made the graph messed up or not, but it is
perfectly possible for the line above the median in a box-and-whisker plot to be
off-center within the box.
A box-and-whisker plot can only tell us the value of the median, not the mode of a
data set. So we cannot say that the mode is 49.
We can conclude that option “A” is correct.
In our final example, we will see how to gain and interpret information from a
box-and-whisker plot.
Example 5: Gaining and Interpreting Information from a Box-and-Whisker Plot
The boxplot shows the daily temperatures at a seaside resort during the month of
August.
What was the median temperature?
What was the maximum temperature recorded?
What was the minimum temperature recorded?
What was the lower quartile of the temperatures?
What was the upper quartile of the temperatures?
What was the interquartile range of the temperatures?
What was the range of the temperatures?
On roughly what percentage of days was the temperature between
and
?
On roughly what percentage of days was the temperature greater than
?
Answer
In order to answer the questions (A)–(I), let us mark the quantities we
know on our box-and-whisker plot.
In a box-and-whisker plot, the median of the data set is the value that sits below the
vertical line inside the box. In our case, this value is 24. So, the median temperature
was .
In a box-and-whisker plot, the maximum value in the data set sits below the right-hand
end of the right-hand (or upper) whisker. As we can see from our plot, the maximum value
here is 31. So the maximum temperature recorded was
.
In a box-and-whisker plot, the minimum value in the data set sits below the left-hand
end of the left-hand (or lower) whisker. As we can see from our plot, the minimum value
here is 18. So the minimum temperature recorded was
.
In a box-and-whisker plot, the lower quartile (Q1) of a data set is the value that
sits below the left-hand edge of the box. In our case, this value is 22. So, the lower
quartile of the temperatures was
.
In a box-and-whisker plot, the upper quartile (Q3) of a data set is the value that
sits below the right-hand edge of the box. In our case, this value is 28. So the upper
quartile of the temperatures was
.
The interquartile range (IQR) of a data set is the distance between the lower and
upper quartiles, which is given by . In this
case, So, the interquartile range of the temperatures
was .
The range of a data set is given by the maximum value minus the minimum value. In our
case, we can see from our box-and-whisker plot that the maximum value in the data set is
31 and the minimum value is 18. So the range is The range of temperatures was therefore
.
To find what percentage of days
the temperature was between
and
, we note again
that the lower quartile, Q1, is
and that the median
is . We know also that
50% of the data lie below the median and that 25% of the data lie below Q1.
So, between Q1 and the median, there must be
of the data. Therefore, on 25% of the days the temperature was between
(which is Q1) and
(which is the
median).
To find on what percentage of days the temperature was greater than
, we note again
that the median was
and that 50% of the data lie above the median.
So, the temperature was greater than
on 50% of the
days.
Let us conclude by reminding ourselves of the main features of a box-and-whisker plot.
Key Points
The box part of a box-and-whisker plot covers the middle 50% of the values in the data
set.
The whiskers each cover 25% of the data values.
The lower whisker covers all the data values from the minimum value up to Q1, that
is, the lowest 25% of data values.
The upper whisker covers all the data values between Q3 and the maximum value, that
is, the highest 25% of data values.
The median sits within the box and represents the center of the data. 50% of the data
values lie above the median and 50% lie below the median.
Outliers, or extreme values, in a data set are usually indicated on a box-and-whisker
plot by the “star” symbol.
Join Nagwa Classes
Attend live sessions on Nagwa Classes to boost your learning with guidance and advice from an expert teacher!