Video Transcript
In this video, we will learn how to
compare between two data sets given in different forms. These will include two-way tables,
histograms, box plots, and line graphs. There are many other ways that we
could represent data and then interpret and analyze it. Let’s, firstly, consider why we
compare data. As well as considering why we would
compare two data sets, we need to consider when it is useful and how we would do
it. Comparing and contrasting data is
necessary for interaction with the environment. Finding differences and/or
similarities helps us organize both new and known information. We can compare objects, ideas,
concepts, events, or other subjects.
In schools, analyzing data is
useful to compare results between boys and girls or between two different subjects,
such as maths and science. Outside of school, data can be used
to compare sports teams, companies, or in a variety of other situations. We will now look at some different
examples where we can compare and contrast data. Our first question involves data in
a two-way table.
A factory produces two types of
shirts: A and B. To calculate how many of each shirt
to produce, the factory gathered data on the sales of their shirts at each of five
malls. The table shows the sales of a
sample of 100 shirts from each mall. Which type of shirt is more
popular?
Let’s, firstly, consider what the
table shows. In the table, we have five
different malls numbered one to five. Each of the malls sold 100 shirts,
either of type A or type B. We can check this by adding the
number of shirts of type A and type B at each mall. 87 plus 13 is equal to 100. 51 plus 49 is also equal to
100. The same is true for the number of
shirts in mall three, four, and five. We are asked to work out which type
of shirt is more popular. In order to do this, we could look
at each mall individually. In mall one, type A was more
popular as 87 is greater than 13.
This is also true in mall two as 51
is greater than 49. In the other three malls, however,
shirt B was more popular as 74 is greater than 26, 77 is greater than 23, and 76 is
greater than 24. In three of the five malls, type B
was more popular, which suggests this is the more popular shirt. A better and more accurate way in
this question to work out the more popular type of shirt would be to work out the
total number of sales of each type. In order to calculate the total
sales of type A, we need to add 87, 51, 26, 23, and 24. This is equal to 211. We can repeat this process for type
B by adding 13, 49, 74, 77, and 76. This is equal to 289.
At this point, it is worth checking
that these two numbers sum or add to 500 as we had five malls each selling 100
shirts. As 289 is greater than 211, we can
conclude that type B is more popular. Based on this sample of 500 shirts,
the factory should produce more shirts of type B than type A.
We will now look at a second
example where we compare two histograms.
These histograms compare the
heights of tall buildings in two cities. Which city has more buildings that
are at least 500 but less than 600 feet tall?
Before starting any question like
this, it is important to look at both axes and understand what they show. The horizontal or 𝑥-axis shows the
height in feet. The first bar in each histogram is
the number of buildings between 400 and 500 feet. We then have the number of
buildings between 500 and 600, 600 and 700, and so on. The 𝑦-axis is represented by the
number of buildings or frequency. This type of graph is also
sometimes called a frequency diagram.
The numbers on this axis range from
zero to 10. Each block or line represents one
building. In this question, we’re interested
in the buildings that are between 500 and 600 feet. This is the second bar of each
histogram. In city A, this corresponds to
eight buildings, whereas, in city B, the second bar corresponds to five
buildings. As eight is greater than five, the
correct answer is city A. This city has more buildings that
are at least 500 but less than 600 feet tall.
Our next question will involve
comparing two box plots or box-and-whisker diagrams.
This double box-and-whisker plot
compares the prices of books in two bookstores. Which bookstore has a greater range
of prices?
Let’s firstly recall the
information that is shown on a box-and-whisker plot. On the 𝑥 or horizontal axis is the
price of the books. This axis goes from 60 to 130
dollars. We have two bookstores. The bottom box-and-whisker plot
shows the information for bookstore A and the top one for bookstore B. There are five key points on any
box-and-whisker plot indicated in this case by the five dots. We have the minimum value, the
lower quartile, median, upper quartile, and the maximum value. In the context of this question,
the minimum is the cheapest book; the maximum is the most expensive book. The median is the price of the
book, with half the books being cheaper and half the books being more expensive, if
you lined all the books up in order from cheapest to most expensive.
The lower quartile is the price of
the book, a quarter of the way up the list. And the upper quartile is the price
of the book, three-quarters of the way up the list. 25 percent of the books are more
expensive than the upper quartile, whereas 75 percent are less expensive or
cheaper. It is also worth recalling that we
can calculate the interquartile range or IQR by subtracting the lower quartile value
from the upper quartile value. This is the range in price of the
middle 50 percent of the books.
In this question, we’re asked to
calculate which bookstore has a greater range of prices. The range of prices will be equal
to the maximum value minus the minimum value. We subtract the cheapest price from
the most expensive one. Let’s firstly consider our values
from bookstore A. Bookstore A had a maximum price of
120 dollars. This is the point furthest to the
right on the bottom box plot. It had a minimum or cheapest price
of 75 dollars as this is the point furthest to the left. To calculate the range, we need to
subtract 75 from 120. This is equal to 45. So the range of prices in bookstore
A is 45 dollars.
Repeating this process for
bookstore B, we have a maximum price of 105 dollars. We have a minimum or cheapest price
of 65 dollars. Subtracting 65 from 105 gives us a
range of prices of 40 dollars. As 45 dollars is greater than 40
dollars, we can conclude that bookstore A has the greater range of prices. This indicates that the price of
books in bookstore A is more spread out than in bookstore B. Both the range and interquartile
range are a measure of spread, whereas the median is a measure of average or central
tendency.
Our final question will look at
comparing two data sets on a line graph.
The broken line graph shows the
number of matches that two volleyball teams, the Blue Jays and the Robins, played
each month for five months. Calculate the median number of
matches played by each team.
Let’s begin by considering what is
shown on the line graph. The horizontal or 𝑥-axis shows
five months, March, April, May, June, and July. The vertical or 𝑦-axis shows the
number of matches played. This axis goes from zero to 10. The blue line shows the number of
matches played in each month by the Blue Jays. The black line shows the same
information for the Robins. If the line is sloping downwards
from month to month, then the number of matches decreases, whereas if it slopes
upwards, the number of matches is increasing.
Let’s firstly consider the Blue
Jays. In March, the Blue Jays played nine
matches. This is the first dot on the blue
line. In April, they also played nine
matches. In May and June, the number of
matches dropped to five. And in July, they played six
matches. Now, let’s consider the number of
matches played by the Robins in each of the five months. The first point on this line graph
corresponds to six matches. So they played six matches in
March. They played five matches in April,
nine in May, six in June, and seven in July.
In this question, we’re asked to
calculate the median number for each team. The median is a measure of average
or central tendency. It is the middle value when the
data is in ascending order. Placing the data for the Blue Jays
in ascending order, we have five, five, six, nine, and nine. As the median will be the middle
value, we can begin by crossing off the highest and lowest value. Crossing off another value from
either end leaves us with one number in the middle. The median number of matches played
by the Blue Jays is six. We can repeat this process for the
Robins. Listing their number of games in
order, we have five, six, six, seven, and nine. We cross off five and nine,
followed by six and seven, from either end.
This means that the median number
of matches played by the Robbins is also six. This means that the median number
of matches played by each team over the five-month period was six. We could also use this data to
compare other aspects of the results, for example, the mean, mode, and range. Comparing each of these values for
both of the teams would enable us to compare the data more thoroughly.
We will now summarize the key
points from this video. Data can be displayed in a variety
of forms, including tables and graphs. In this video, we have looked at
two-way tables, histograms, box-and-whisker plots, and line graphs. We can compare two or more data
sets using raw values/frequencies or averages. Interpreting data allows us to make
conclusions as well as predictions for future trends. This is very useful in the majority
of industries.