Question Video: Selecting the Data Set with the Highest Standard Deviation Mathematics

Start Practising

Without calculating the exact standard deviations, determine which of the following data sets has the highest standard deviation. [A] 100, 200, 300, 400, 500, 500 [B] 3, 31, 53, 63, 63, 63 [C] 1000, 2000, 3000, 4000, 5000, 6000 [D] 10, 10, 10, 10, 10, 11 [E] 100, 100, 100, 100, 100, 100

05:22

Video Transcript

Without calculating the exact standard deviations, determine which of the following data sets has the highest standard deviation. Data set (A) has elements 100, 200, 300, 400, 500, and 500. Data set (B) has elements three, 31, 53, 63, 63, and 63. Data set (C) has elements 1000, 2000, 3000, 4000, 5000, and 6000. Data set (D) has five elements which are equal to 10 and one element that’s equal to 11. And in data set (E), all six elements have the value of 100.

We’re given five data sets, each of which contains six elements or observations. We know that the standard deviation of a data set measures the dispersion of the data from the mean. So the higher the standard deviation, the more dispersed the data is from the mean. Another way of describing the standard deviation is the average distance between the mean and the individual data points in the set, that is, how far on average the data spreads from the mean.

Now, since we’re not going to calculate the standard deviations for options (A) to (E), let’s consider what each data set can tell us about the spread simply from observation. If we begin with data set (E) since this is the simplest, we see that every element in the data set is the same. It has the same value of 100. And so in this data set, there’s actually no dispersion. There is no difference between any of the data points. And if we were to calculate the mean of this data set, recalling that the mean is the sum of all the observations divided by the number of observations, in fact we would find that the mean is equal to 100. And since none of the observations differ from this mean at all, we can confidently say without having to calculate it that the standard deviation of this data set is equal to zero.

Now, making some space, let’s next consider the data set for option (D). This is quite similar to option (E) since five of our data points are the same, although in this case they have a value of 10. And here one of our observations differs but by only one unit, with the value 11. If we were to work out the mean of this data set, we would expect the mean to be very close to 10. And so the average deviation from this, since five out of six data points are actually 10, is very small. And so in this case, we can expect our standard deviation to be very close to zero.

Making some space again, let’s next consider option (C). Now, in this case, our values are 1000, 2000, 3000, 4000, 5000, and 6000. And since each data point differs from the next by the same amount, that’s 1000, we can estimate the mean to be in the center of the set. In fact, the mean is 3500. Now, on either side of the mean, the distance from the mean to the highest and lowest points is 2500 units. And the distance from the mean to the nearest points is 500 units. So the standard deviation, which is the average distance from the mean, must be somewhere in between these two values of 500 and 2500. Since the observations in this data set are so evenly spread, we expect the standard deviation to be somewhere in the middle of these two values. That’s 500 and 2500. But we don’t need to be any more specific than this.

So now let’s consider option (B). In this data set, our data ranges from three to 63, although in fact four out of six, that’s two-thirds of our data set, is above 50 and three of the values are equal to 63. So, in fact, the weight of our data is above 50. On our graph, this looks like the data has a fairly wide spread. And with the majority of the data above 50, we expect the mean to be at the higher end of the data, perhaps somewhere between 40 and 50, say. So let’s estimate a mean of around 45.

Now, the average dispersion or deviation from the mean must be less than the greatest distance between the mean and a data point. In this case, our greatest distance is 42 units. We can therefore be confident in our estimation that the standard deviation for data set (B) must be less than 42 units.

So now let’s consider our final data set, that’s option (A), where in this case our data ranges from 100 to 500. And with two observations taking the value of 500, we can expect our mean to be weighted slightly higher than the center of the data, perhaps between three and four hundred. So let’s say around 330. So now a maximum dispersion from the mean is going to be no more than about 230. So we can expect our standard deviation to be less than 230. Of course, this is an estimate, but now we can compare our five data sets. And we expect the standard deviation for option (A) to be less than 230, that for option (B) to be less than 42, for option (C) we estimate between 500 and 2500, for option (D) almost zero, and for option (E) exactly equal to zero.

Hence, since option (C) has the highest dispersion, the data set with the highest standard deviation is data set (C).

Question Video: Selecting the Data Set with the Highest Standard Deviation Mathematics

Video Transcript

Join Nagwa Classes