Video Transcript
Without calculating the exact
standard deviations, determine which of the following data sets has the highest
standard deviation. Data set (A) has elements 100, 200,
300, 400, 500, and 500. Data set (B) has elements three,
31, 53, 63, 63, and 63. Data set (C) has elements 1000,
2000, 3000, 4000, 5000, and 6000. Data set (D) has five elements
which are equal to 10 and one element that’s equal to 11. And in data set (E), all six
elements have the value of 100.
We’re given five data sets, each of
which contains six elements or observations. We know that the standard deviation
of a data set measures the dispersion of the data from the mean. So the higher the standard
deviation, the more dispersed the data is from the mean. Another way of describing the
standard deviation is the average distance between the mean and the individual data
points in the set, that is, how far on average the data spreads from the mean.
Now, since we’re not going to
calculate the standard deviations for options (A) to (E), let’s consider what each
data set can tell us about the spread simply from observation. If we begin with data set (E) since
this is the simplest, we see that every element in the data set is the same. It has the same value of 100. And so in this data set, there’s
actually no dispersion. There is no difference between any
of the data points. And if we were to calculate the
mean of this data set, recalling that the mean is the sum of all the observations
divided by the number of observations, in fact we would find that the mean is equal
to 100. And since none of the observations
differ from this mean at all, we can confidently say without having to calculate it
that the standard deviation of this data set is equal to zero.
Now, making some space, let’s next
consider the data set for option (D). This is quite similar to option (E)
since five of our data points are the same, although in this case they have a value
of 10. And here one of our observations
differs but by only one unit, with the value 11. If we were to work out the mean of
this data set, we would expect the mean to be very close to 10. And so the average deviation from
this, since five out of six data points are actually 10, is very small. And so in this case, we can expect
our standard deviation to be very close to zero.
Making some space again, let’s next
consider option (C). Now, in this case, our values are
1000, 2000, 3000, 4000, 5000, and 6000. And since each data point differs
from the next by the same amount, that’s 1000, we can estimate the mean to be in the
center of the set. In fact, the mean is 3500. Now, on either side of the mean,
the distance from the mean to the highest and lowest points is 2500 units. And the distance from the mean to
the nearest points is 500 units. So the standard deviation, which is
the average distance from the mean, must be somewhere in between these two values of
500 and 2500. Since the observations in this data
set are so evenly spread, we expect the standard deviation to be somewhere in the
middle of these two values. That’s 500 and 2500. But we don’t need to be any more
specific than this.
So now let’s consider option
(B). In this data set, our data ranges
from three to 63, although in fact four out of six, that’s two-thirds of our data
set, is above 50 and three of the values are equal to 63. So, in fact, the weight of our data
is above 50. On our graph, this looks like the
data has a fairly wide spread. And with the majority of the data
above 50, we expect the mean to be at the higher end of the data, perhaps somewhere
between 40 and 50, say. So let’s estimate a mean of around
45.
Now, the average dispersion or
deviation from the mean must be less than the greatest distance between the mean and
a data point. In this case, our greatest distance
is 42 units. We can therefore be confident in
our estimation that the standard deviation for data set (B) must be less than 42
units.
So now let’s consider our final
data set, that’s option (A), where in this case our data ranges from 100 to 500. And with two observations taking
the value of 500, we can expect our mean to be weighted slightly higher than the
center of the data, perhaps between three and four hundred. So let’s say around 330. So now a maximum dispersion from
the mean is going to be no more than about 230. So we can expect our standard
deviation to be less than 230. Of course, this is an estimate, but
now we can compare our five data sets. And we expect the standard
deviation for option (A) to be less than 230, that for option (B) to be less than
42, for option (C) we estimate between 500 and 2500, for option (D) almost zero, and
for option (E) exactly equal to zero.
Hence, since option (C) has the
highest dispersion, the data set with the highest standard deviation is data set
(C).