Video Transcript
By calculating the standard deviation, determine which of the sets negative 17, 20, six, negative 13; negative five, negative 16, five, nine; or negative one, negative six, 20, negative one has the largest dispersion.
The standard deviation of a data set is a way of measuring its dispersion or spread around its mean. The larger the standard deviation, the more dispersed the data is and vice versa. The smaller the standard deviation, the less dispersed around the mean the data is. We can recall that for a data set 𝑥 — which contains the values 𝑥 sub one, 𝑥 sub two, 𝑥 sub three, and so on up to 𝑥 sub 𝑛, so there are 𝑛 values in total — with mean 𝜇, the standard deviation, which we denote by 𝜎 sub 𝑥, is given by 𝜎 sub 𝑥 equals the square root of the sum from one to 𝑛 of 𝑥 𝑖 minus 𝜇 squared over 𝑛.
Practically, what this means we need to do is find the mean 𝜇 of the data set, subtract this value from each 𝑥-value in the data set, square each of these values, and then find their sum. We then divide by 𝑛, that’s the number of values there are in the data set, and then find the square root. We can also recall that the mean is the sum of all of the 𝑥-values divided by how many values there are. So, it’s the sum from one to 𝑛 of 𝑥 sub 𝑖 over 𝑛. Let’s consider then each of the data sets in turn. And we’ll begin by calculating the mean for each.
For the first data set, the mean is negative 17 plus 20 plus six plus negative 13 over four. That’s negative four over four, which is equal to negative one. For the second data set, the mean is negative five plus negative 16 plus five plus nine over four. That’s negative seven over four or as a decimal negative 1.75. And for the final data set, the mean is negative one plus negative six plus 20 plus negative one over four. That’s 12 over four, which is equal to three.
So, we found the mean for each data set, and now we need to work through the process of calculating the standard deviation. We’ll find it helpful to organize our working in a table. So, we have one column in which we write the values in the data set, the next column in which we subtract the mean from each value, and then, in the final column, we will square these values. So, for the first data set where the mean is negative one, we have negative 17 minus negative one. That’s negative 17 plus one, which is negative 16. We then have 20 minus negative one, which is 21; six minus negative one, which is seven; and negative 13 minus negative one, which is negative 12.
In the final column of our table, we square these values, giving 256, 441, 49, and 144. The sum of the four values in the final column, so that’s the sum of each 𝑥-value minus the mean squared, is 890. This gives the numerator of the fraction underneath the radical. So we have for the first data set 𝜎 𝑥, the standard deviation is equal to the square root of 890 over four. We can evaluate this using a calculator, and it gives 14.916 continuing or 14.92 to two decimal places. So we’ve worked through the process and calculated the standard deviation for the first set of data, and now we need to do the same for the other two sets.
For the second data set, the mean is negative 1.75, so this is the value we need to subtract from each value in the data set. Subtracting negative 1.75 is the same as adding 1.75, so we obtain negative 3.25, negative 14.25, 6.75, and 10.75. We then square each of these values in the final column of our table and find their sum, which is 374.75. Once again, there are four values in this data set, so 𝑛 is equal to four. And we have that 𝜎 sub 𝑥 is equal to the square root of 374.75 over four. As a decimal, that’s equal to 9.679 continuing or 9.68 to two decimal places.
Finally, we’ll calculate the standard deviation for the third data set, which has a mean of three. Subtracting three from each value in the data set gives negative four, negative nine, 17, and negative four. Squaring these values, we have 16, 81, 289, and 16, the sum of which is 402. The standard deviation for this final data set is the square root of 402 over four. That’s 10.024 continuing or 10.02 to two decimal places.
So, we’ve calculated the standard deviation for each of the three data sets, and now we need to determine which has the largest dispersion. Remember, we said that the larger the standard deviation, the more dispersed around the mean the data is. Comparing the standard deviations of 14.92, 9.68, and 10.02, we see that it is the first data set that has the largest standard deviation. So, of the three data sets, the one that has the largest dispersion is the data set negative 17, 20, six, negative 13.