# Lesson Video: Comparing Data Mathematics

In this video, we will learn how to compare two sets of data in a variety of contexts using tables and graphs.

14:17

### Video Transcript

In this video, we will learn how to compare between two data sets given in different forms. These will include two-way tables, histograms, box plots, and line graphs. There are many other ways that we could represent data and then interpret and analyze it. Let’s, firstly, consider why we compare data. As well as considering why we would compare two data sets, we need to consider when it is useful and how we would do it. Comparing and contrasting data is necessary for interaction with the environment. Finding differences and/or similarities helps us organize both new and known information. We can compare objects, ideas, concepts, events, or other subjects.

In schools, analyzing data is useful to compare results between boys and girls or between two different subjects, such as maths and science. Outside of school, data can be used to compare sports teams, companies, or in a variety of other situations. We will now look at some different examples where we can compare and contrast data. Our first question involves data in a two-way table.

A factory produces two types of shirts: A and B. To calculate how many of each shirt to produce, the factory gathered data on the sales of their shirts at each of five malls. The table shows the sales of a sample of 100 shirts from each mall. Which type of shirt is more popular?

Let’s, firstly, consider what the table shows. In the table, we have five different malls numbered one to five. Each of the malls sold 100 shirts, either of type A or type B. We can check this by adding the number of shirts of type A and type B at each mall. 87 plus 13 is equal to 100. 51 plus 49 is also equal to 100. The same is true for the number of shirts in mall three, four, and five. We are asked to work out which type of shirt is more popular. In order to do this, we could look at each mall individually. In mall one, type A was more popular as 87 is greater than 13.

This is also true in mall two as 51 is greater than 49. In the other three malls, however, shirt B was more popular as 74 is greater than 26, 77 is greater than 23, and 76 is greater than 24. In three of the five malls, type B was more popular, which suggests this is the more popular shirt. A better and more accurate way in this question to work out the more popular type of shirt would be to work out the total number of sales of each type. In order to calculate the total sales of type A, we need to add 87, 51, 26, 23, and 24. This is equal to 211. We can repeat this process for type B by adding 13, 49, 74, 77, and 76. This is equal to 289.

At this point, it is worth checking that these two numbers sum or add to 500 as we had five malls each selling 100 shirts. As 289 is greater than 211, we can conclude that type B is more popular. Based on this sample of 500 shirts, the factory should produce more shirts of type B than type A.

We will now look at a second example where we compare two histograms.

These histograms compare the heights of tall buildings in two cities. Which city has more buildings that are at least 500 but less than 600 feet tall?

Before starting any question like this, it is important to look at both axes and understand what they show. The horizontal or 𝑥-axis shows the height in feet. The first bar in each histogram is the number of buildings between 400 and 500 feet. We then have the number of buildings between 500 and 600, 600 and 700, and so on. The 𝑦-axis is represented by the number of buildings or frequency. This type of graph is also sometimes called a frequency diagram.

The numbers on this axis range from zero to 10. Each block or line represents one building. In this question, we’re interested in the buildings that are between 500 and 600 feet. This is the second bar of each histogram. In city A, this corresponds to eight buildings, whereas, in city B, the second bar corresponds to five buildings. As eight is greater than five, the correct answer is city A. This city has more buildings that are at least 500 but less than 600 feet tall.

Our next question will involve comparing two box plots or box-and-whisker diagrams.

This double box-and-whisker plot compares the prices of books in two bookstores. Which bookstore has a greater range of prices?

Let’s firstly recall the information that is shown on a box-and-whisker plot. On the 𝑥 or horizontal axis is the price of the books. This axis goes from 60 to 130 dollars. We have two bookstores. The bottom box-and-whisker plot shows the information for bookstore A and the top one for bookstore B. There are five key points on any box-and-whisker plot indicated in this case by the five dots. We have the minimum value, the lower quartile, median, upper quartile, and the maximum value. In the context of this question, the minimum is the cheapest book; the maximum is the most expensive book. The median is the price of the book, with half the books being cheaper and half the books being more expensive, if you lined all the books up in order from cheapest to most expensive.

The lower quartile is the price of the book, a quarter of the way up the list. And the upper quartile is the price of the book, three-quarters of the way up the list. 25 percent of the books are more expensive than the upper quartile, whereas 75 percent are less expensive or cheaper. It is also worth recalling that we can calculate the interquartile range or IQR by subtracting the lower quartile value from the upper quartile value. This is the range in price of the middle 50 percent of the books.

In this question, we’re asked to calculate which bookstore has a greater range of prices. The range of prices will be equal to the maximum value minus the minimum value. We subtract the cheapest price from the most expensive one. Let’s firstly consider our values from bookstore A. Bookstore A had a maximum price of 120 dollars. This is the point furthest to the right on the bottom box plot. It had a minimum or cheapest price of 75 dollars as this is the point furthest to the left. To calculate the range, we need to subtract 75 from 120. This is equal to 45. So the range of prices in bookstore A is 45 dollars.

Repeating this process for bookstore B, we have a maximum price of 105 dollars. We have a minimum or cheapest price of 65 dollars. Subtracting 65 from 105 gives us a range of prices of 40 dollars. As 45 dollars is greater than 40 dollars, we can conclude that bookstore A has the greater range of prices. This indicates that the price of books in bookstore A is more spread out than in bookstore B. Both the range and interquartile range are a measure of spread, whereas the median is a measure of average or central tendency.

Our final question will look at comparing two data sets on a line graph.

The broken line graph shows the number of matches that two volleyball teams, the Blue Jays and the Robins, played each month for five months. Calculate the median number of matches played by each team.

Let’s begin by considering what is shown on the line graph. The horizontal or 𝑥-axis shows five months, March, April, May, June, and July. The vertical or 𝑦-axis shows the number of matches played. This axis goes from zero to 10. The blue line shows the number of matches played in each month by the Blue Jays. The black line shows the same information for the Robins. If the line is sloping downwards from month to month, then the number of matches decreases, whereas if it slopes upwards, the number of matches is increasing.

Let’s firstly consider the Blue Jays. In March, the Blue Jays played nine matches. This is the first dot on the blue line. In April, they also played nine matches. In May and June, the number of matches dropped to five. And in July, they played six matches. Now, let’s consider the number of matches played by the Robins in each of the five months. The first point on this line graph corresponds to six matches. So they played six matches in March. They played five matches in April, nine in May, six in June, and seven in July.

In this question, we’re asked to calculate the median number for each team. The median is a measure of average or central tendency. It is the middle value when the data is in ascending order. Placing the data for the Blue Jays in ascending order, we have five, five, six, nine, and nine. As the median will be the middle value, we can begin by crossing off the highest and lowest value. Crossing off another value from either end leaves us with one number in the middle. The median number of matches played by the Blue Jays is six. We can repeat this process for the Robins. Listing their number of games in order, we have five, six, six, seven, and nine. We cross off five and nine, followed by six and seven, from either end.

This means that the median number of matches played by the Robbins is also six. This means that the median number of matches played by each team over the five-month period was six. We could also use this data to compare other aspects of the results, for example, the mean, mode, and range. Comparing each of these values for both of the teams would enable us to compare the data more thoroughly.

We will now summarize the key points from this video. Data can be displayed in a variety of forms, including tables and graphs. In this video, we have looked at two-way tables, histograms, box-and-whisker plots, and line graphs. We can compare two or more data sets using raw values/frequencies or averages. Interpreting data allows us to make conclusions as well as predictions for future trends. This is very useful in the majority of industries.