In this explainer, we will learn how to find and interpret the standard deviation from a given data set.
In order to understand the meaning of the standard deviation of a data set we first recall the definition of the mean of a data set.
Definition: The Mean of a Data Set
The mean, average, or expected value of a data set is used as measure of central tendency. For a data set , where there are values, the mean, denoted by (pronounced “miu”) or , is calculated by taking the sum of the data set and dividing it by the number of values , as indicated in the formula below:
The standard deviation of a data set tells us the dispersion of data from the mean. The larger the standard deviation, the more dispersed the data is from the mean, and the smaller the standard deviation, the less dispersed the data is from the mean.
The square of the standard deviation is called the variance and is another measure of dispersion. A further measure of dispersion is the interquartile range, which is the difference between the upper quartile and the lower quartile, or the value of the 75th percentile minus the value of the 25th percentile. In this explainer, we will only be focusing on the standard deviation as a measure of dispersion.
The standard deviation is more formally defined in the definition below.
Definition: The Standard Deviation of a Data Set
The standard deviation of a data set is used to measure the dispersion of data from the mean. For a data set , where there are values, the standard deviation, denoted by (pronounced “sigma ”), is calculated by taking the sum of the difference of values of the data set from the mean squared, dividing by the number of values, and square rooting, as indicated in the formula below:
Another way of describing the standard deviation is as the average distance between the mean and the individual data points in the set. So, if the standard deviation is larger, then the average distance between the mean and the individual data points will be greater, meaning they are more dispersed. Similarly, if the standard deviation is smaller, then the distance between the mean and the individual data points will be less, meaning they are less dispersed.
We will use the definition of the standard deviation of a data set to answer the first example.
Example 1: Understanding Standard Deviation
What is the name of a quantity expressing by how much the members of a group differ from the mean value for the group?
Answer
We know that the standard deviation of a data set determines how dispersed the data set is from the mean. This can also be described as how much the members of a data set differ from the mean of the data set.
Therefore, the quantity expressing by how much the members of a group differ from the mean value for the group is the standard deviation. A low standard deviation tells us that the data points are, on average, closer to the mean, and a high standard deviation tells us that the data points are, on average, further from the mean.
Having discussed what the definition of the standard deviation is, we will next consider the case where the measure of dispersion is zero, as seen in the next example.
Example 2: Identifying a Set of Values with Zero Dispersion
If the dispersion of a set of values is equal to zero, then which of the following is true?
- The difference between the individual values is great.
- The difference between the individual values is small.
- All the values are equal.
- The arithmetic mean of these values is zero.
- All the values are negative.
Answer
The dispersion of a data set can be measured using the standard deviation, denoted . For a data set , with values and a mean , this is calculated by using the following formula:
If the dispersion of a data set is equal to zero, then the standard deviation is equal to zero. By setting the formula for the standard deviation equal to zero, we get
By squaring both sides, we get
Then, by multiplying both sides by , we have
Now, if we square any real number greater than zero, then we get a value greater than zero. Also, if we square any real number less than zero, then we still get a value greater than zero. So, the brackets must each equal zero for the result to be zero:
So, equaling each bracket to zero gives us
When solving for , we get
Therefore, all of the members of the data set are equal to the mean and are equal, which is option C.
In the next example, we will use the formula for the standard deviation of a data set to determine the standard deviation when given the sum of the squares of the differences and the number of data points.
Example 3: Calculating Standard Deviation
If for a set of 6 values equals 25, find the standard deviation of the set, and round the result to the nearest thousandth.
Answer
To calculate the standard deviation of a set of data, we first recall the formula where denotes the standard deviation of the set of data , , is the number of members of the data set, and is the mean of the data set.
We are told , which is the same as saying . We are also told there are 6 values, which indicates that .
By substituting and and solving for , we get
Our answer is therefore 2.041 when rounded to the nearest thousandth.
Next, we will discuss how to find the standard deviation of a data set. We will explore this in detail below.
When calculating the standard deviation of a set of data, we need to execute a number of steps when working with the formula. First, let’s recall the formula where denotes the standard deviation of the set of data , , is the number of members of the data set, and is the mean of the data set.
To help demonstrate how to use the formula, we will use the following data set:
We will next execute the following steps using this data set to illustrate how the steps work.
Step 1: Finding the mean
As we need to calculate the difference between the mean and the members of the set within the brackets of the formula, we need to start by calculating the mean. This is where denotes the mean, is the data set, and is the number of points in the data set.
For the data set , this gives us
Step 2: Finding the difference between the mean and each of the data points
In order to calculate in the formula, we need to calculate for all values of , or, in other words, the difference between the mean and each of the data points. For this step and subsequent steps, it is helpful to lay this out in a table.
1 | |
---|---|
1 | |
3 | |
5 | |
7 |
Step 3: Finding the sum of the squares of the difference between the mean and each of the data points
Following on from step 2, in order to calculate in the formula, we next need to calculate for all values of and sum this. In other words, we need to square the difference between the mean and each of the data points and sum these. We will use the table from step 2 and add a further column.
1 | ||
---|---|---|
1 | ||
3 | ||
5 | ||
7 |
Summing the last column, we get
Step 4: Substituting into the formula and finding the standard deviation
For the final step, we substitute the sum of squares and in the formula and then calculate the value of the standard deviation.
From step 3, we found , and we know . Therefore, by substituting into the formula for and solving, we get which is the standard deviation for the data set .
We can summarize these steps as follows.
How To: Finding the Standard Deviation of a Data Set
Step 1: Finding the mean of the data set
Step 2: Finding the difference between the mean and the value of each of the data points
Step 3: Finding the sum of the squares of the difference between the mean and the value of each of the data points
Step 4: Substituting the sum of the squares and into the formula and square rooting in order to calculate the standard deviation (This should always be positive.)
In the next example, we will use this process to calculate the standard deviation of a data set.
Example 4: Calculating the Standard Deviation of a Data Set
Calculate the standard deviation of the values 45, 35, 42, 49, 39, and 34. Give your answer to 3 decimal places.
Answer
To find the standard deviation of a set of data, we use the formula where denotes the standard deviation of the set of data , , is the number of members of the data set, and is the mean of the data set.
First, we will calculate the mean, , of the data set. Recall the formula for the mean, which is
In this case, the data set is and the number of members of the data set is 6. So, by substituting for and 6 for , we get
Next, we will calculate for each member of the data set. To help ourselves do this, we will lay the data out in a table as follows:
45 | |
---|---|
35 | |
42 | |
49 | |
39 | |
34 |
Following this, we can now calculate . To do this, we will square for each member of the data set and then sum all the data. We will add another column to the table above for ease of calculation.
45 | ||
---|---|---|
35 | ||
42 | ||
49 | ||
39 | ||
34 |
When we sum for each member of the data set, we get
We can now substitute and back into the original formula for the standard deviation and solve for :
Therefore, the answer is 5.312 when rounded to 3 decimal places.
So, the standard deviation for the data set is 5.312 correct to three decimal places.
In the next example, we will discuss which data set among three data sets has the largest dispersion by using the standard deviation.
Example 5: Selecting a Data Set with the Highest Standard Deviation
By calculating the standard deviation, determine which of the sets , , and has the largest dispersion.
Answer
To find the standard deviation of each of the data sets, we use the formula where denotes the standard deviation of the set of data , , is the number of members of the data set, and is the mean of the data set.
We can see that each data set has four members, so is 4 for each case.
We will find the standard deviation of each data set first, then compare these in order to determine which has the largest dispersion.
For , we will first find the mean, , of the data set. Recall the formula for the mean, which is
Therefore, by substituting for and 4 for , we get
Next, we will calculate for each member of the data set. To help ourselves do this, we will lay the data out in a table as follows:
20 | |
---|---|
6 | |
Following this, we can now calculate . To do this, we will square for each member of the data set and then sum all the data. We will add another column to the table above for ease of calculation.
20 | ||
---|---|---|
6 | ||
When we sum for each member of the data set, we get
We can now substitute and back into the original formula for the standard deviation and solve for :
We will now repeat these steps for the other two data sets.
For , the mean is
To calculate , we will find and for each member of the data set. We will lay this out in a table as before.
5 | ||
---|---|---|
9 |
Summing for each member of the data set, we get
Substituting and back into the original formula for the standard deviation and solving for , we get
For the last data set, , the mean is
To calculate , we will find and for each member of the data set. We will lay this out in a table as before.
20 | ||
---|---|---|
Summing for each member of the data set, we get
Substituting and back into the original formula for the standard deviation and solving for , we get
We have found the standard deviation for each of the data sets. Let’s summarize this below:
- For , correct to 2 decimal places.
- For , correct to 2 decimal places.
- For , correct to 2 decimal places.
By comparing these data sets, we can see the first one, , has the largest standard deviation.
Therefore has the largest dispersion, since the standard deviation is a measure of dispersion.
So far, we have found the standard deviation of a set of data where the data has been presented in a list. Next, we will learn how to find the standard deviation from data set that is presented in a frequency table.
To find the standard deviation of a data set where the data is presented in a frequency table, we need to consider the frequency of the values in the data set as well as the values in the data set itself. One way of doing this could be to list the values. For example, consider the following data set:
3 | 1 |
---|---|
4 | 7 |
5 | 3 |
We could write this as one 3, seven 4s, and three 5s or in order to calculate the standard deviation, as previously discussed. The problem with this approach is when there are high frequencies of data points (say 100 or even 1 000), as we would have to write this out in a very long list. As such, it is more efficient to calculate squares of the differences in each data set and then multiply this by the corresponding frequency (much in the same way we would calculate the mean of a set of data in a frequency table).
Before considering the formula and method for finding the standard deviation of a set of data in a frequency table, we will first recall how to calculate the mean of a set of data from a frequency table.
Definition: The Mean of a Data Set in a Frequency Table
For a data set , with corresponding frequencies and distinct values of the data set, the mean is calculated as follows:
Another way to represent this is in a table with the values of the data set in the first column, their corresponding frequencies in the second column, the multiplication of the data point and frequency in the third column, and the sums in the last row of the table. The mean can then be calculated by dividing the sum of the third column by the sum of the second column.
Having recapped the mean of a data set in a frequency table, we will next discuss the standard deviation. The formula for this is as follows.
Definition: The Standard Deviation of a Data Set in a Frequency Table
For a data set , with corresponding frequencies , distinct values of the data set, and mean , the standard deviation is calculated as follows:
The approach for finding the standard deviation of a data set is generally the same as the approach for finding the standard deviation of a data set in a frequency table; however, there are some important differences. As we are working with frequencies, we need to multiply each value in the data by its corresponding frequency when calculating the mean. Also, when calculating the sum of the squares of the difference between the mean and each different value of the data, we also need to multiply by the frequency.
In the next example, we will discuss how to find the standard deviation of a data set that is in a frequency table.
Example 6: Determining the Standard Deviation of a Data Set
The table shows the distribution of goals scored in the first half of a football season.
Number of Goals | 0 | 1 | 3 | 4 | 6 |
---|---|---|---|---|---|
Number of Games | 5 | 2 | 7 | 7 | 4 |
Find the standard deviation of the number of goals scored. Give your answer to three decimal places.
Answer
As the data presented in this question is in the form of a frequency table, in order to calculate the standard deviation , we use the formula where represents the values of the data set with corresponding frequencies , there are distinct values of the data set, and the mean is represented by .
In this question, the values of the data set are the number of goals scored in the first half of a football season. The number of games refers to the frequency with which each of these goals was scored. Let’s rewrite this using and as the headings and by transposing the table, as follows:
0 | 5 |
---|---|
1 | 2 |
3 | 7 |
4 | 7 |
6 | 4 |
To calculate the standard deviation, we must first calculate the mean . For a set of data with corresponding frequencies and distinct values of the data set, we use the following formula:
Using the table above, we can add a new column in order to find for each value of and then use this to find the mean.
0 | 5 | |
---|---|---|
1 | 2 | |
3 | 7 | |
4 | 7 | |
6 | 4 |
By summing the values for and dividing the sum of the frequencies, we get
Next, we will calculate the difference between each value of the data set and the mean and the square of this in order to calculate the sum of the squares. We will do this by adding two further columns to the table above.
0 | 5 | |||
---|---|---|---|---|
1 | 2 | |||
3 | 7 | |||
4 | 7 | |||
6 | 4 |
We now need to calculate the product of squares of the differences of the mean and values of the data and the frequencies of the values of the data set. We will add another column to the table to do this.
0 | 5 | ||||
---|---|---|---|---|---|
1 | 2 | ||||
3 | 7 | ||||
4 | 7 | ||||
6 | 4 |
We are now ready to find the standard deviation. We will substitute the values from the table into the formula of the standard deviation and solve for : which is 1.960 to three decimal places.
Therefore, the standard deviation of the number of goals scored is 1.960 to three decimal places.
Next, we will discuss how to calculate the standard deviation of grouped data using the midpoint. This approach involves the same steps as with frequency tables, but we are dealing with intervals for our data set rather than a set of values; then, we need to use the midpoint in order to approximate the set of values. We will explore this further in our final example.
Example 7: Find the Standard Deviation of a Grouped Data Set
A quiz was completed by 92 students and their scores were recorded in the following frequency table. Find the standard deviation to two decimal places.
Score | |||||
---|---|---|---|---|---|
Frequency | 26 | 10 | 24 | 5 | 27 |
Answer
As the data presented in this question is in the form of a frequency table, in order to calculate the standard deviation , we use the formula where represents the values of the data set with corresponding frequencies , there are distinct values of the data set, and the mean is represented by .
For this type of problem, we have been given different “classes” of values represented by intervals rather than exact values. This means we cannot directly apply the formula above, since we cannot substitute these intervals for the values of in our formula.
Instead, the approach we must take is to find the “midpoint” of each interval and use this to represent the corresponding value of . After doing so, we can treat the problem as we would with any other grouped frequency table.
To find the midpoint, we add together the endpoints and divide by 2. This allows us to find an approximate standard deviation of the data set.
So, the values of the data set are the midpoint of each of the scores obtained in the quiz and the corresponding frequencies are the frequencies for each of the values. Let’s find the midpoint of each of the intervals and then rewrite the midpoints as and the frequencies as , as follows:
Interval | Midpoint | Frequency |
---|---|---|
26 | ||
10 | ||
24 | ||
5 | ||
27 |
To calculate the standard deviation, we must first calculate the mean, . For a set of data with corresponding frequencies and distinct values of the data set, we use the following formula:
Again, we remember that the midpoint is now being used to represent the values of . Using the table above, we can add a new column in order to find for each value of and then use this to find the mean.
Interval | Midpoint | Frequency | |
---|---|---|---|
26 | |||
10 | |||
24 | |||
5 | |||
27 |
By summing the values for and dividing the sum of the frequencies, we get
Next, we will calculate the difference between the midpoints of each class in our data set and the mean and the square of this in order to calculate the sum of the squares. We will do this by adding two further columns to the table above. Note that all values have been rounded to 4 decimal places.
Interval | Midpoint | Frequency | |||
---|---|---|---|---|---|
26 | 1 548.2494 | ||||
10 | 374.3375 | ||||
24 | 0.4254 | ||||
5 | 426.5134 | ||||
27 | 1 652.6014 |
We now need to calculate the product of squares of the differences of the mean and midpoints of the data and the frequencies of the values of the data set. We will add another column to the table to do this. Again, we will round to 4 decimal places.
Interval | Midpoint | Frequency | ||||
---|---|---|---|---|---|---|
26 | 1 548.24936 | 40 254.4818 | ||||
10 | 374.337365 | 3 743.3736 | ||||
24 | 0.42536484 | 10.2088 | ||||
5 | 426.513365 | 2 132.5668 | ||||
27 | 1 652.60136 | 44 620.2351 |
We are now ready to find the standard deviation. We will substitute the values from the table into the formula of the standard deviation and solve for : which is 31.41 when rounded to 2 decimal places.
Therefore, the standard deviation is 31.41 to 2 decimal places.
In this explainer, we have learned what the standard deviation is and how to find it for a set of data, from both a list and a frequency table. We have also learned how to compare data sets and draw conclusions using the standard deviation.
Key Points
- The standard deviation of a data set is used to measure the dispersion of data from the mean.
- For data presented in a list, the formula for the standard deviation of a set of data with members and mean is
- For data presented in a frequency table, the formula for the standard deviation of a data set , with corresponding frequencies , distinct values of the data set, and mean is
- For grouped frequency tables where data is given in intervals, the midpoint of the interval is used to represent the values of .