Video: Box and Whisker Plots

In this lesson, we will learn how to construct and analyze data from box-and-whisker plots.

17:59

Video Transcript

In this lesson, we will learn how to construct box and whisker plots for any data set. When we have a numerical data set, a good way of showing how the data is spread out from the centre is with a box and whisker plot. Remember that a numerical data set is one where the values are measurements like height, weight, or age. We say that the variable, the thing that we are measuring, for example, height, is numerical. It will be helpful to remind ourselves of some useful terms before looking at examples of box and whisker plots and how to use them.

We will consider eight useful terms that will be used when looking at box and whisker plots. The minimum of a set of data is the smallest value in the data set. The maximum of a set of data is the largest value in the data set. The range of a data set is the maximum value minus the minimum value. It is the difference between the two values.

The first quartile, or 𝑄 one, is the value in a data set below which 25 percent of the data lie. This is also sometimes called the lower quartile. The median, or second quartile, of a data set is the middle value. So, 50 percent of the data lie below the median. This means that 50 percent will also lie above the median. The third quartile, or 𝑄 three, is the value in a data set below which 75 percent of the data lie. This is also sometimes called the upper quartile. 25 percent of the data lie above this point.

The interquartile range, or IQR for short, of a data set is given by 𝑄 three minus 𝑄 one and represents 50 percent of the data. This means that 50 percent of the data lies between the first quartile and the third quartile. An outlier is a value that is much smaller or much larger than most of the other values in a data set. This is also sometimes known as an extreme value or anomaly. We will now look at these different components and how they enable us to draw a box and whisker plot.

Let’s firstly look at the definition of a box and whisker plot. A box and whisker plot, or box plot, is a graph that illustrates the spread of a set of numerical data using five numbers from the data set. These are the maximum, the minimum, the first quartile, the median, and the third quartile. We begin by drawing a horizontal axis that covers all possible data values. This can occasionally be drawn vertically, although in the vast majority of cases it will be horizontal. The box part of a box and whisker plot covers the middle 50 percent of the values in the data set. The whiskers each cover 25 percent of the data values.

The lower whisker covers all the data values from the minimum value up to 𝑄 one. That is the lowest 25 percent of data values. The upper whisker covers all the data values between 𝑄 three and the maximum value. These are the highest 25 percent of data values. The median sits within the box and represents the centre of the data. 50 percent of the data values lie above the median, and 50 percent lie below. Overall, we can see that 25 percent of the data lies between each of the key values.

Outliers, or extreme values, in a data set are usually indicated on a box and whisker plot by the star symbol. If there is one or more outlier in a data set, then for the purpose of drawing a box and whisker plot, we take a minimum and maximum to be the minimum and maximum values of the data excluding the outliers. We will now look at some examples. In our first example, we will draw a box and whisker plot for a specific data set and interpret some of the features of the plot.

Noah has calculated the following information from a data set about the ages of people present on a Saturday morning in a swimming pool. The lowest value was seven. The lower quartile was 10. The median was 15, the upper quartile, 22, and the highest value, 31. There are four parts to this question. Part a) draw a box and whisker plot using the information Noah has calculated from the data set. Part b) what is the overall range of the swimmers? Part c) what percentage of swimmers were between seven and 22 years old. Part d) calculate and interpret the percentage of swimmers covered by the box.

Let’s begin by looking at part a) and drawing the box and whisker plot. Our first step is to draw an appropriate π‘₯- or horizontal axis. In this case, this will show the age of the swimmers. This axis must go below the lowest value and above the highest value, in this case, below seven and above 31. The five key values that we need to draw the box and whisker plot are seven, 10, 15, 22, and 31. We can mark these on the horizontal axis. Firstly, the lowest, or minimum value, is seven. The lower quartile, or 𝑄 one, is 10. The median value of the swimmers was 15. 𝑄 three, or the upper quartile, was 22. And finally, 31 was the maximum, or highest value. This was the age of the oldest swimmer.

We can now use these five values to draw the box and whiskers. The box goes from 𝑄 one to 𝑄 three, from the lower quartile to the upper quartile. We begin by drawing three vertical lines at 𝑄 one, the median, and 𝑄 three. We can then complete the box by drawing two horizontal lines. It doesn’t matter how far above the axis we draw the box. However, it should be close enough to the axis so that we can read the values clearly. Next, we need to draw the whiskers. These will go down to the minimum value and up to the maximum value. Adding these lines completes the box and whisker plot. We have a minimum value of seven, a lower quartile or 𝑄 one of 10, a median of 15, a 𝑄 three or upper quartile of 22, and a maximum value of 31. We’ll, now look at parts b), c), and d).

Part b) of the question said the following. What is the overall age range of the swimmers? The range of any data sets can be calculated by subtracting the minimum value from the maximum value. In this question, the maximum value was 31 and the minimum value was seven. We need to subtract seven from 31. This is equal to 24. Therefore, the age range of the swimmers on the Saturday morning was 24.

The third part of the question, part c), said the following. What percentage of swimmers were between seven and 22 years old. We know that each section of the box and whisker plot represents 25 percent of the data. 25 percent lies between the minimum and 𝑄 one. 25 percent between 𝑄 one and the median. There’s 25 percent between the median and 𝑄 three. And finally, 25 percent of the data lies between 𝑄 three and the maximum value.

The minimum value, or age, was seven. And 𝑄 three was equal to 22. We need to work out what percentage is between the minimum value and 𝑄 three. 25 percent plus 25 percent plus 25 percent equals 75 percent. Therefore, 75 percent of the data lies between the minimum value and 𝑄 three. We also know this by definition as 75 percent of any data lies below 𝑄 three. In this question, 75 percent of the swimmers were between seven and 22 years of age.

We will now look at the final part of the question, part d). Calculate and interpret the percentage of Saturday morning swimmers covered by the box. The box contains all of the values between 𝑄 one and 𝑄 three. 25 percent of the values are between 𝑄 one and the median. And 25 percent are between the median and 𝑄 three. As 25 plus 25 is equal to 50, then 50 percent of the values are between 𝑄 one and 𝑄 three. We also know this by definition. And since the box stretches from 𝑄 one to 𝑄 three, the box must cover 50 percent of the data. The correct answer for part d) is 50 percent.

In our next example, we will look at interpreting a box and whisker plot. In this example, we will interpret the box and whisker plot to identify the correct statement. The question says the following.

Look at the box and whisker plot. Give a reason why the line inside the box is further to the right. Is it A) the median is closer to 𝑄 three than 𝑄 one? B) The mean is about 49? C) The person who made the graph made a mistake? Or D) the mode is 49?

Let’s begin by marking on the five key points of our box and whisker plot. The point furthest to the left is the minimum value. This is between 30 and 35 and is closer to 35. The point furthest to the right is the maximum value. This is between 55 and 60. The left-hand edge of our box is the lower quartile, or 𝑄 one. The right-hand edge of the box is 𝑄 three. This is also known as the upper quartile. Finally, the line between 𝑄 one and 𝑄 three is the median. In this question, the median is equal to 49. Let’s now consider our four options.

Options B) and D) include the number 49. However, these mention the mean and the mode. And the box and whisker plot can only tell us the value of the median, not the mode, and not the mean. We can, therefore, eliminate options B) and D). We can see from the box part of our graph that the median is closer to the right edge than the left edge. As the right edge represents 𝑄 three and the left edge represents 𝑄 one, then option A) is correct. The median is closer to 𝑄 three than 𝑄 one.

We cannot tell whether the person who made the graph made a mistake or not. But it is perfectly possible for the line above the median to be off-centre, i.e., closer to the right or closer to the left of the box. This means that in this question, option C) is also incorrect. The reason why the line is further to the right is that the median is closer to 𝑄 three than 𝑄 one.

We will now look at one more example of interpreting a box and whisker plot. This question uses the word box plot, which is an alternative name for a box and whisker plot.

The box plot shows the daily temperatures at a seaside resort during the month of August. There are seven parts to this question. We are asked to work out the median, the maximum, the minimum, the lower quartile, the upper quartile, the interquartile range, and the range of the temperatures.

In order to answer this question, it will help to mark on the key points first. The point at the end of the left-hand whisker is the minimum value. And the point at the end of the right-hand whisker is the maximum value. The left-hand edge of our box is known as 𝑄 one or the lower quartile. The right-hand edge of our box is 𝑄 three. This is also known as the upper quartile. Finally, the line that is between 𝑄 one and 𝑄 three is the median. We can now read off the key values to answer the questions.

The values from 18 to 32 along the horizontal axis are the temperatures in degrees Celsius. Part A) asked us to calculate the median, which is 24 degrees Celsius. Part B) wanted the maximum temperature. This is the point furthest to the right. Therefore, the maximum temperature is 31 degrees Celsius. Part C) wanted the minimum, or lowest temperature. This is the point furthest to the left and is equal to 18 degrees Celsius.

Part D) asked us to work out the lower quartile. This is the 𝑄 one value and is equal to 22 degrees Celsius. Part E) asked us for the upper quartile temperature. This is the 𝑄 three value and is equal to 28 degrees Celsius. Part F) asked us to calculate the interquartile range. We can calculate the interquartile range, or IQR, by subtracting quartile one from quartile three. The upper quartile was equal to 28. And the lower quartile was equal to 22. We need to subtract 22 from 28. This is equal to six. Therefore, the interquartile range of the temperatures is six degrees Celsius.

The final part, part G), asked us to calculate the range of the temperatures. The range of any set of values is the maximum value minus the minimum value. In this question, we need to subtract the lowest temperature 18 from the highest temperature 31. 31 minus 18 is equal to 13. Therefore, the range of temperatures is 13 degrees Celsius. The answers to the seven parts of the question are 24, 31, 18, 22, 28, six, and 13 degrees Celsius, respectively.

We will now conclude this lesson by reminding ourselves of the main features of a box and whisker plot. Firstly, the box part of a box and whisker plot covers the middle 50 percent of the values in the data set. The whiskers each cover 25 percent of the data values. The lower whisker covers all the values from the minimum value up to 𝑄 one, or the lower quartile. This is the lowest 25 percent of data values. The upper whisker covers all the values between 𝑄 three, or the upper quartile, and the maximum value. This is the highest 25 percent of data values. The median sits within the box and represents the centre of the data. 50 percent of the values lie above the median and 50 percent below.

Nagwa uses cookies to ensure you get the best experience on our website. Learn more about our Privacy Policy.