Lesson Explainer: Standard Deviation of a Data Set Mathematics

In this explainer, we will learn how to find and interpret the standard deviation from a given data set.

In order to understand the meaning of the standard deviation of a data set we first recall the definition of the mean of a data set.

Definition: The Mean of a Data Set

The mean, average, or expected value of a data set is used as measure of central tendency. For a data set 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯}, where there are 𝑛 values, the mean, denoted by πœ‡ (pronounced β€œmiu”) or π‘₯, is calculated by taking the sum of the data set and dividing it by the number of values 𝑛, as indicated in the formula below: πœ‡=π‘₯+π‘₯+π‘₯+β‹―+π‘₯𝑛=βˆ‘π‘₯𝑛.οŠ§οŠ¨οŠ©οŠοŠοƒοŠ²οŠ§οƒ

The standard deviation of a data set tells us the dispersion of data from the mean. The larger the standard deviation, the more dispersed the data is from the mean, and the smaller the standard deviation, the less dispersed the data is from the mean.

The square of the standard deviation is called the variance and is another measure of dispersion. A further measure of dispersion is the interquartile range, which is the difference between the upper quartile and the lower quartile, or the value of the 75th percentile minus the value of the 25th percentile. In this explainer, we will only be focusing on the standard deviation as a measure of dispersion.

The standard deviation is more formally defined in the definition below.

Definition: The Standard Deviation of a Data Set

The standard deviation of a data set is used to measure the dispersion of data from the mean. For a data set 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯}, where there are 𝑛 values, the standard deviation, denoted by πœŽο— (pronounced β€œsigma π‘₯”), is calculated by taking the sum of the difference of values of the data set from the mean πœ‡ squared, dividing by the number of values, and square rooting, as indicated in the formula below: 𝜎=ο„Ÿ(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+β‹―+(π‘₯βˆ’πœ‡)𝑛=ο„Ÿβˆ‘(π‘₯βˆ’πœ‡)𝑛.ο—οŠ§οŠ¨οŠ¨οŠ¨οŠ©οŠ¨οŠοŠ¨οŠοƒοŠ²οŠ§οƒοŠ¨

Another way of describing the standard deviation is as the average distance between the mean and the individual data points in the set. So, if the standard deviation is larger, then the average distance between the mean and the individual data points will be greater, meaning they are more dispersed. Similarly, if the standard deviation is smaller, then the distance between the mean and the individual data points will be less, meaning they are less dispersed.

We will use the definition of the standard deviation of a data set to answer the first example.

Example 1: Understanding Standard Deviation

What is the name of a quantity expressing by how much the members of a group differ from the mean value for the group?

Answer

We know that the standard deviation of a data set determines how dispersed the data set is from the mean. This can also be described as how much the members of a data set differ from the mean of the data set.

Therefore, the quantity expressing by how much the members of a group differ from the mean value for the group is the standard deviation. A low standard deviation tells us that the data points are, on average, closer to the mean, and a high standard deviation tells us that the data points are, on average, further from the mean.

Having discussed what the definition of the standard deviation is, we will next consider the case where the measure of dispersion is zero, as seen in the next example.

Example 2: Identifying a Set of Values with Zero Dispersion

If the dispersion of a set of values is equal to zero, then which of the following is true?

  1. The difference between the individual values is great.
  2. The difference between the individual values is small.
  3. All the values are equal.
  4. The arithmetic mean of these values is zero.
  5. All the values are negative.

Answer

The dispersion of a data set can be measured using the standard deviation, denoted πœŽο—. For a data set 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯}, with 𝑛 values and a mean πœ‡, this is calculated by using the following formula: 𝜎=ο„Ÿ(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+β‹―+(π‘₯βˆ’πœ‡)𝑛.ο—οŠ§οŠ¨οŠ¨οŠ¨οŠ©οŠ¨οŠοŠ¨

If the dispersion of a data set is equal to zero, then the standard deviation is equal to zero. By setting the formula for the standard deviation equal to zero, we get 𝜎=ο„Ÿ(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+β‹―+(π‘₯βˆ’πœ‡)𝑛=0.ο—οŠ§οŠ¨οŠ¨οŠ¨οŠ©οŠ¨οŠοŠ¨

By squaring both sides, we get (π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+β‹―+(π‘₯βˆ’πœ‡)𝑛=0(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+β‹―+(π‘₯βˆ’πœ‡)𝑛=0.

Then, by multiplying both sides by 𝑛, we have (π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+β‹―+(π‘₯βˆ’πœ‡)=0×𝑛(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+β‹―+(π‘₯βˆ’πœ‡)=0.

Now, if we square any real number greater than zero, then we get a value greater than zero. Also, if we square any real number less than zero, then we still get a value greater than zero. So, the brackets must each equal zero for the result to be zero: (π‘₯βˆ’πœ‡)ο‡Œο†²ο†²ο‡ο†²ο†²ο‡Ž+(π‘₯βˆ’πœ‡)ο‡Œο†²ο†²ο‡ο†²ο†²ο‡Ž+(π‘₯βˆ’πœ‡)ο‡Œο†²ο†²ο‡ο†²ο†²ο‡Ž+β‹―+(π‘₯βˆ’πœ‡)ο‡Œο†²ο†²ο‡ο†²ο†²ο‡Ž=0.=0=0=0=0

So, equaling each bracket to zero gives us π‘₯βˆ’πœ‡=0,π‘₯βˆ’πœ‡=0,π‘₯βˆ’πœ‡=0,β‹―π‘₯βˆ’πœ‡=0.

When solving for π‘₯,π‘₯,π‘₯,…,π‘₯, we get π‘₯=πœ‡,π‘₯=πœ‡,π‘₯=πœ‡,β‹―π‘₯=πœ‡.

Therefore, all of the members of the data set 𝑋 are equal to the mean πœ‡ and are equal, which is option C.

In the next example, we will use the formula for the standard deviation of a data set to determine the standard deviation when given the sum of the difference of squares and the number of data points.

Example 3: Calculating Standard Deviation

If ο„šο€Ήπ‘₯βˆ’π‘₯ο…οŠ¨ for a set of 6 values equals 25, find the standard deviation of the set, and round the result to the nearest thousandth.

Answer

To calculate the standard deviation of a set of data, we first recall the formula 𝜎=ο„Ÿβˆ‘ο€Ήπ‘₯βˆ’π‘₯𝑛,ο—οŠοƒοŠ²οŠ§οƒοŠ¨ where πœŽο— denotes the standard deviation of the set of data 𝑋, 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯}, 𝑛 is the number of members of the data set, and π‘₯ is the mean of the data set.

We are told ο„šο€Ήπ‘₯βˆ’π‘₯=25, which is the same as saying οƒοŠ²οŠοƒοŠ²οŠ§οƒοŠ¨ο„šο€Ήπ‘₯βˆ’π‘₯=25. We are also told there are 6 values, which indicates that 𝑛=6.

By substituting οƒοŠ²οŠοƒοŠ²οŠ§οƒοŠ¨ο„šο€Ήπ‘₯βˆ’π‘₯=25 and 𝑛=6 and solving for πœŽο—, we get 𝜎=ο„Ÿβˆ‘ο€Ήπ‘₯βˆ’π‘₯𝑛=ο„ž256=2.041241β€¦β‰ˆ2.041.ο—οŠοƒοŠ²οŠ§οƒοŠ¨

Our answer is therefore 2.041 when rounded to the nearest thousandth.

Next, we will discuss how to find the standard deviation of a data set. We will explore this in detail below.

When calculating the standard deviation of a set of data, we need to execute a number of steps when working with the formula. First, let’s recall the formula 𝜎=ο„Ÿ(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+(π‘₯βˆ’πœ‡)+β‹―+(π‘₯βˆ’πœ‡)𝑛=ο„Ÿβˆ‘(π‘₯βˆ’πœ‡)𝑛,ο—οŠ§οŠ¨οŠ¨οŠ¨οŠ©οŠ¨οŠοŠ¨οŠοƒοŠ²οŠ§οƒοŠ¨ where πœŽο— denotes the standard deviation of the set of data 𝑋, 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯}, 𝑛 is the number of members of the data set, and πœ‡ is the mean of the data set.

To help demonstrate how to use the formula, we will use the following data set: 𝑋={1,1,3,5,7}.

We will next execute the following steps using this data set to illustrate how the steps work.

Step 1: Finding the mean

As we need to calculate the difference between the mean and the members of the set within the brackets of the formula, we need to start by calculating the mean. This is πœ‡=π‘₯+π‘₯+π‘₯+β‹―+π‘₯𝑛=βˆ‘π‘₯𝑛,οŠ§οŠ¨οŠ©οŠοŠοƒοŠ²οŠ§οƒ where πœ‡ denotes the mean, 𝑋={π‘₯+π‘₯+π‘₯+β‹―+π‘₯} is the data set, and 𝑛 is the number of points in the data set.

For the data set 𝑋={1,1,3,5,7}, this gives us πœ‡=π‘₯+π‘₯+π‘₯+β‹―+π‘₯𝑛=1+1+3+5+75=175=3.4.

Step 2: Finding the difference between the mean and each of the data points

In order to calculate οŠοƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡) in the formula, we need to calculate π‘₯βˆ’πœ‡οƒ for all values of 𝑖=1,…,𝑛, or, in other words, the difference between the mean and each of the data points. For this step and subsequent steps, it is helpful to lay this out in a table.

π‘₯π‘₯βˆ’πœ‡οƒ
11βˆ’3.4=βˆ’2.4
11βˆ’3.4=βˆ’2.4
33βˆ’3.4=βˆ’0.4
55βˆ’3.4=1.6
77βˆ’3.4=3.6

Step 3: Finding the sum of the squares of the difference between the mean and each of the data points

Following on from step 2, in order to calculate οŠοƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡) in the formula, we next need to calculate (π‘₯βˆ’πœ‡)οƒοŠ¨ for all values of 𝑖=1,…,𝑛 and sum this. In other words, we need to square the difference between the mean and each of the data points and sum these. We will use the table from step 2 and add a further column.

π‘₯π‘₯βˆ’πœ‡οƒ(π‘₯βˆ’πœ‡)οƒοŠ¨
11βˆ’3.4=βˆ’2.4(βˆ’2.4)=5.76
11βˆ’3.4=βˆ’2.4(βˆ’2.4)=5.76
33βˆ’3.4=βˆ’0.4(βˆ’0.4)=0.16
55βˆ’3.4=1.6(1.6)=2.56
77βˆ’3.4=3.6(3.6)=12.96

Summing the last column, we get οŠ¬οƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡)=5.76+5.76+0.16+2.56+12.96=27.2.

Step 4: Substituting into the formula and finding the standard deviation

For the final step, we substitute the sum of squares and 𝑛 in the formula and then calculate the value of the standard deviation.

From step 3, we found οŠοƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡)=27.2, and we know 𝑛=5. Therefore, by substituting into the formula for πœŽο— and solving, we get 𝜎=ο„Ÿβˆ‘(π‘₯βˆ’πœ‡)𝑛=ο„ž27.25=√5.44=2.3323β€¦β‰ˆ2.33,ο—οŠοƒοŠ²οŠ§οƒοŠ¨ which is the standard deviation for the data set 𝑋={1,1,3,5,7}.

We can summarize these steps as follows.

How To: Finding the Standard Deviation of a Data Set

Step 1: Finding the mean of the data set

Step 2: Finding the difference between the mean and the value of each of the data points

Step 3: Finding the sum of the squares of the difference between the mean and the value of each of the data points

Step 4: Substituting the sum of the squares and 𝑛 into the formula and square rooting in order to calculate the standard deviation (This should always be positive.)

In the next example, we will use this process to calculate the standard deviation of a data set.

Example 4: Calculating the Standard Deviation of a Data Set

Calculate the standard deviation of the values 45, 35, 42, 49, 39, and 34. Give your answer to 3 decimal places.

Answer

To find the standard deviation of a set of data, we use the formula 𝜎=ο„Ÿβˆ‘(π‘₯βˆ’πœ‡)𝑛,ο—οŠοƒοŠ²οŠ§οƒοŠ¨ where πœŽο— denotes the standard deviation of the set of data 𝑋, 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯}, 𝑛 is the number of members of the data set, and πœ‡ is the mean of the data set.

First, we will calculate the mean, πœ‡, of the data set. Recall the formula for the mean, which is πœ‡=π‘₯+π‘₯+π‘₯+β‹―+π‘₯𝑛.

In this case, the data set 𝑋 is {45,35,42,49,39,34} and the number of members of the data set is 6. So, by substituting {45,35,42,49,39,34} for {π‘₯,π‘₯,π‘₯,…,π‘₯} and 6 for 𝑛, we get πœ‡=45+35+42+49+39+346=2446=40.Μ‡6.

Next, we will calculate π‘₯βˆ’πœ‡οƒ for each member of the data set. To help ourselves do this, we will lay the data out in a table as follows:

π‘₯π‘₯βˆ’πœ‡οƒ
4545βˆ’40.Μ‡6=4.Μ‡3
3535βˆ’40.Μ‡6=βˆ’5.Μ‡6
4242βˆ’40.Μ‡6=1.Μ‡3
4949βˆ’40.Μ‡6=8.Μ‡3
3939βˆ’40.Μ‡6=βˆ’1.Μ‡6
3434βˆ’40.Μ‡6=βˆ’6.Μ‡6

Following this, we can now calculate οŠοƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡). To do this, we will square π‘₯βˆ’πœ‡οƒ for each member of the data set and then sum all the data. We will add another column to the table above for ease of calculation.

π‘₯π‘₯βˆ’πœ‡οƒ(π‘₯βˆ’πœ‡)οƒοŠ¨
4545βˆ’40.Μ‡6=4.Μ‡3ο€Ή4.Μ‡3=18.Μ‡7
3535βˆ’40.Μ‡6=βˆ’5.Μ‡6ο€Ήβˆ’5.Μ‡6=32.Μ‡1
4242βˆ’40.Μ‡6=1.Μ‡3ο€Ή1.Μ‡3=1.Μ‡7
4949βˆ’40.Μ‡6=8.Μ‡3ο€Ή8.Μ‡3=69.Μ‡4
3939βˆ’40.Μ‡6=βˆ’1.Μ‡6ο€Ήβˆ’1.Μ‡6=2.Μ‡7
3434βˆ’40.Μ‡6=βˆ’6.Μ‡6ο€Ήβˆ’6.Μ‡6=44.Μ‡4

When we sum (π‘₯βˆ’πœ‡)οƒοŠ¨ for each member of the data set, we get οŠ¬οƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡)=18.Μ‡7+32.Μ‡1+1.Μ‡7+69.Μ‡4+2.Μ‡7+44.Μ‡4=169.Μ‡3.

We can now substitute οŠ¬οƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡)=169.Μ‡3 and 𝑛=6 back into the original formula for the standard deviation and solve for πœŽο—: 𝜎=ο„Ÿβˆ‘(π‘₯βˆ’πœ‡)𝑛=ο„ž169.Μ‡36=√28.Μ‡2β‰ˆ5.312459….ο—οŠοƒοŠ²οŠ§οƒοŠ¨

Therefore, the answer is 5.312 when rounded to 3 decimal places.

So, the standard deviation for the data set is 5.312 correct to three decimal places.

In the next example, we will discuss which data set among three data sets has the largest dispersion by using the standard deviation.

Example 5: Selecting a Data Set with the Highest Standard Deviation

By calculating the standard deviation, determine which of the sets {βˆ’17,20,6,βˆ’13}, {βˆ’5,βˆ’16,5,9}, and {βˆ’1,βˆ’6,20,βˆ’1} has the largest dispersion.

Answer

To find the standard deviation of each of the data sets, we use the formula 𝜎=ο„Ÿβˆ‘(π‘₯βˆ’πœ‡)𝑛,ο—οŠοƒοŠ²οŠ§οƒοŠ¨ where πœŽο— denotes the standard deviation of the set of data 𝑋, 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯}, 𝑛 is the number of members of the data set, and πœ‡ is the mean of the data set.

We can see that each data set has four members, so 𝑛 is 4 for each case.

We will find the standard deviation of each data set first, then compare these in order to determine which has the largest dispersion.

For {βˆ’17,20,6,βˆ’13}, we will first find the mean, πœ‡, of the data set. Recall the formula for the mean, which is πœ‡=π‘₯+π‘₯+π‘₯+β‹―+π‘₯𝑛.

Therefore, by substituting {βˆ’17,20,6,βˆ’13} for {π‘₯,π‘₯,π‘₯,π‘₯}οŠͺ and 4 for 𝑛, we get πœ‡=βˆ’17+20+6+(βˆ’13)4=βˆ’44=βˆ’1.

Next, we will calculate π‘₯βˆ’πœ‡οƒ for each member of the data set. To help ourselves do this, we will lay the data out in a table as follows:

π‘₯π‘₯βˆ’πœ‡οƒ
βˆ’17βˆ’17βˆ’(βˆ’1)=βˆ’16
2020βˆ’(βˆ’1)=21
66βˆ’(βˆ’1)=7
βˆ’13βˆ’13βˆ’(βˆ’1)=βˆ’12

Following this, we can now calculate οŠοƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡). To do this, we will square π‘₯βˆ’πœ‡οƒ for each member of the data set and then sum all the data. We will add another column to the table above for ease of calculation.

π‘₯π‘₯βˆ’πœ‡οƒ(π‘₯βˆ’πœ‡)οƒοŠ¨
βˆ’17βˆ’17βˆ’(βˆ’1)=βˆ’16(βˆ’16)=256
2020βˆ’(βˆ’1)=21(21)=441
66βˆ’(βˆ’1)=7(7)=49
βˆ’13βˆ’13βˆ’(βˆ’1)=βˆ’12(βˆ’12)=144

When we sum (π‘₯βˆ’πœ‡)οƒοŠ¨ for each member of the data set, we get οŠͺοƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡)=256+441+49+144=890.

We can now substitute οŠͺοƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡)=890 and 𝑛=4 back into the original formula for the standard deviation and solve for πœŽο—: 𝜎=ο„Ÿβˆ‘(π‘₯βˆ’πœ‡)𝑛=ο„ž8904=√222.5=14.9164….ο—οŠοƒοŠ²οŠ§οƒοŠ¨

We will now repeat these steps for the other two data sets.

For {βˆ’5,βˆ’16,5,9}, the mean is πœ‡=π‘₯+π‘₯+π‘₯+β‹―+π‘₯𝑛=βˆ’5+(βˆ’16)+5+94=βˆ’74=βˆ’1.75.

To calculate οŠοƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡), we will find π‘₯βˆ’πœ‡οƒ and (π‘₯βˆ’πœ‡)οƒοŠ¨ for each member of the data set. We will lay this out in a table as before.

π‘₯π‘₯βˆ’πœ‡οƒ(π‘₯βˆ’πœ‡)οƒοŠ¨
βˆ’5βˆ’5βˆ’(βˆ’1.75)=βˆ’3.25(βˆ’3.25)=10.5625
βˆ’16βˆ’16βˆ’(βˆ’1.75)=βˆ’14.25(βˆ’14.25)=203.0625
55βˆ’(βˆ’1.75)=6.75(6.75)=45.5625
99βˆ’(βˆ’1.75)=10.75(10.75)=115.5625

Summing (π‘₯βˆ’πœ‡)οƒοŠ¨ for each member of the data set, we get οŠͺοƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡)=10.5625+203.0625+45.5625+115.5625=374.75.

Substituting οŠͺοƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡)=374.75 and 𝑛=4 back into the original formula for the standard deviation and solving for πœŽο—, we get 𝜎=ο„Ÿβˆ‘(π‘₯βˆ’πœ‡)𝑛=ο„ž374.754=√93.6875=9.6792….ο—οŠοƒοŠ²οŠ§οƒοŠ¨

For the last data set, {βˆ’1,βˆ’6,20,βˆ’1}, the mean is πœ‡=π‘₯+π‘₯+π‘₯+β‹―+π‘₯𝑛=βˆ’1+(βˆ’6)+20+(βˆ’1)4=124=3.

To calculate οŠοƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡), we will find π‘₯βˆ’πœ‡οƒ and (π‘₯βˆ’πœ‡)οƒοŠ¨ for each member of the data set. We will lay this out in a table as before.

π‘₯π‘₯βˆ’πœ‡οƒ(π‘₯βˆ’πœ‡)οƒοŠ¨
βˆ’1βˆ’1βˆ’3=βˆ’4(βˆ’4)=16
βˆ’6βˆ’6βˆ’3=βˆ’9(βˆ’9)=81
2020βˆ’3=17(17)=289
βˆ’1βˆ’1βˆ’3=βˆ’4(βˆ’4)=16

Summing (π‘₯βˆ’πœ‡)οƒοŠ¨ for each member of the data set, we get οŠͺοƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡)=16+81+289+16=402.

Substituting οŠͺοƒοŠ²οŠ§οƒοŠ¨ο„š(π‘₯βˆ’πœ‡)=402 and 𝑛=4 back into the original formula for the standard deviation and solving for πœŽο—, we get 𝜎=ο„Ÿβˆ‘(π‘₯βˆ’πœ‡)𝑛=ο„ž4024=√100.5=10.0249….ο—οŠοƒοŠ²οŠ§οƒοŠ¨

We have found the standard deviation for each of the data sets. Let’s summarize this below:

  • For {βˆ’17,20,6,βˆ’13}, 𝜎=14.91 correct to 2 decimal places.
  • For {βˆ’5,βˆ’16,5,9}, 𝜎=9.68 correct to 2 decimal places.
  • For {βˆ’1,βˆ’6,20,βˆ’1}, 𝜎=10.02 correct to 2 decimal places.

By comparing these data sets, we can see the first one, {βˆ’17,20,6,βˆ’13}, has the largest standard deviation.

Therefore {βˆ’17,20,6,βˆ’13} has the largest dispersion, since the standard deviation is a measure of dispersion.

So far, we have found the standard deviation of a set of data where the data has been presented in a list. Next, we will learn how to find the standard deviation from data set that is presented in a frequency table.

To find the standard deviation of a data set where the data is presented in a frequency table, we need to consider the frequency of the values in the data set as well as the values in the data set itself. One way of doing this could be to list the values. For example, consider the following data set:

π‘₯𝑓
31
47
53

We could write this as one 3, seven 4s, and three 5s or 3,4,4,4,4,4,4,4,5,5,5 in order to calculate the standard deviation, as previously discussed. The problem with this approach is when there are high frequencies of data points (say 100 or even 1β€Žβ€‰β€Ž000), as we would have to write this out in a very long list. As such, it is more efficient to calculate the difference of squares of the mean in each data set and then multiply this by the corresponding frequency (much in the same way we would calculate the mean of a set of data in a frequency table).

Before considering the formula and method for finding the standard deviation of a set of data in a frequency table, we will first recall how to calculate the mean of a set of data from a frequency table.

Definition: The Mean of a Data Set in a Frequency Table

For a data set 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯}, with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,…,𝑓} and 𝑛 distinct values of the data set, the mean πœ‡ is calculated as follows: πœ‡=π‘₯𝑓+π‘₯𝑓+π‘₯𝑓+β‹―+π‘₯𝑓𝑓+𝑓+𝑓+β‹―+𝑓=βˆ‘π‘₯π‘“βˆ‘π‘“.οŠ§οŠ§οŠ¨οŠ¨οŠ©οŠ©οŠοŠοŠ§οŠ¨οŠ©οŠοŠοƒοŠ²οŠ§οƒοƒοŠοƒοŠ²οŠ§οƒ

Another way to represent this is in a table with the values of the data set in the first column, their corresponding frequencies in the second column, the multiplication of the data point and frequency in the third column, and the sums in the last row of the table. The mean can then be calculated by dividing the sum of the second column by the sum of the third column.

π‘₯𝑓π‘₯𝑓
π‘₯οŠ§π‘“οŠ§π‘₯π‘“οŠ§οŠ§
π‘₯οŠ¨π‘“οŠ¨π‘₯π‘“οŠ¨οŠ¨
π‘₯οŠ©π‘“οŠ©π‘₯π‘“οŠ©οŠ©
β‹―β‹―β‹―
π‘₯οŠπ‘“οŠπ‘₯π‘“οŠοŠ
οŠοƒοŠ²οŠ§οƒο„šπ‘“οŠοƒοŠ²οŠ§οƒοƒο„šπ‘₯𝑓

Having recapped the mean of a data set in a frequency table, we will next discuss the standard deviation. The formula for this is as follows.

Definition: The Standard Deviation of a Data Set in a Frequency Table

For a data set 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯}, with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,…,𝑓}, 𝑛 distinct values of the data set, and mean πœ‡, the standard deviation πœŽο— is calculated as follows: 𝜎=ο„Ÿ(π‘₯βˆ’πœ‡)×𝑓+(π‘₯βˆ’πœ‡)×𝑓+(π‘₯βˆ’πœ‡)×𝑓+β‹―+(π‘₯βˆ’πœ‡)×𝑓𝑓+𝑓+𝑓+β‹―+𝑓=ο„‘ο„£ο„£ο„ βˆ‘(π‘₯βˆ’πœ‡)π‘“βˆ‘π‘“.ο—οŠ§οŠ¨οŠ§οŠ¨οŠ¨οŠ¨οŠ©οŠ¨οŠ©οŠοŠ¨οŠοŠ§οŠ¨οŠ©οŠοŠοƒοŠ²οŠ§οƒοŠ¨οƒοŠοƒοŠ²οŠ§οƒ

The approach for finding the standard deviation of a data set is generally the same as the approach for finding the standard deviation of a data set in a frequency table; however, there are some important differences. As we are working with frequencies, we need to multiply each value in the data by its corresponding frequency when calculating the mean. Also, when calculating the sum of the squares of the difference between the mean and each different value of the data, we also need to multiply by the frequency.

In the next example, we will discuss how to find the standard deviation of a data set that is in a frequency table.

Example 6: Determining the Standard Deviation of a Data Set

The table shows the distribution of goals scored in the first half of a football season.

Number of Goals01346
Number of Games52774

Find the standard deviation of the number of goals scored. Give your answer to three decimal places.

Answer

As the data presented in this question is in the form of a frequency table, in order to calculate the standard deviation πœŽο—, we use the formula 𝜎=ο„Ÿ(π‘₯βˆ’πœ‡)×𝑓+(π‘₯βˆ’πœ‡)×𝑓+(π‘₯βˆ’πœ‡)×𝑓+β‹―+(π‘₯βˆ’πœ‡)×𝑓𝑓+𝑓+𝑓+β‹―+𝑓=ο„‘ο„£ο„£ο„ βˆ‘(π‘₯βˆ’πœ‡)π‘“βˆ‘π‘“,ο—οŠ§οŠ¨οŠ§οŠ¨οŠ¨οŠ¨οŠ©οŠ¨οŠ©οŠοŠ¨οŠοŠ§οŠ¨οŠ©οŠοŠοƒοŠ²οŠ§οƒοŠ¨οƒοŠοƒοŠ²οŠ§οƒ where 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯} represents the values of the data set with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,…,𝑓}, there are 𝑛 distinct values of the data set, and the mean is represented by πœ‡.

In this question, the values of the data set are the number of goals scored in the first half of a football season. The number of games refers to the frequency with which each of these goals was scored. Let’s rewrite this using π‘₯ and 𝑓 as the headings and by transposing the table, as follows:

π‘₯𝑓
05
12
37
47
64

To calculate the standard deviation, we must first calculate the mean πœ‡. For a set of data 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯} with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,…,𝑓} and 𝑛 distinct values of the data set, we use the following formula: πœ‡=π‘₯𝑓+π‘₯𝑓+π‘₯𝑓+β‹―+π‘₯𝑓𝑓+𝑓+𝑓+β‹―+𝑓=βˆ‘π‘₯π‘“βˆ‘π‘“.οŠ§οŠ§οŠ¨οŠ¨οŠ©οŠ©οŠοŠοŠ§οŠ¨οŠ©οŠοŠοƒοŠ²οŠ§οƒοƒοŠοƒοŠ²οŠ§οƒ

Using the table above, we can add a new column in order to find π‘₯𝑓 for each value of 𝑖 and then use this to find the mean.

π‘₯𝑓π‘₯𝑓
050Γ—5=0
121Γ—2=2
373Γ—7=21
474Γ—7=28
646Γ—4=24

By summing the values for π‘₯𝑓 and dividing the sum of the frequencies, we get πœ‡=π‘₯𝑓+π‘₯𝑓+π‘₯𝑓+β‹―+π‘₯𝑓𝑓+𝑓+𝑓+β‹―+𝑓=0+2+21+28+245+2+7+7+4=7525=3.

Next, we will calculate the difference between each value of the data set and the mean and the square of this in order to calculate the sum of the squares. We will do this by adding two further columns to the table above.

π‘₯𝑓π‘₯𝑓π‘₯βˆ’πœ‡οƒ(π‘₯βˆ’πœ‡)οƒοŠ¨
050Γ—5=00βˆ’3=βˆ’3(βˆ’3)=9
121Γ—2=21βˆ’3=βˆ’2(βˆ’2)=4
373Γ—7=213βˆ’3=00=0
474Γ—7=284βˆ’3=11=1
646Γ—4=246βˆ’3=33=9

We now need to calculate the product of difference of squares of the mean and values of the data and the frequencies of the values of the data set. We will add another column to the table to do this.

π‘₯𝑓π‘₯𝑓π‘₯βˆ’πœ‡οƒ(π‘₯βˆ’πœ‡)οƒοŠ¨(π‘₯βˆ’πœ‡)π‘“οƒοŠ¨οƒ
050Γ—5=00βˆ’3=βˆ’3(βˆ’3)=99Γ—5=45
121Γ—2=21βˆ’3=βˆ’2(βˆ’2)=44Γ—2=8
373Γ—7=213βˆ’3=00=00Γ—7=0
474Γ—7=284βˆ’3=11=11Γ—7=7
646Γ—4=246βˆ’3=33=99Γ—4=36

We are now ready to find the standard deviation. We will substitute the values from the table into the formula of the standard deviation and solve for πœŽο—: 𝜎=ο„Ÿ(π‘₯βˆ’πœ‡)×𝑓+(π‘₯βˆ’πœ‡)×𝑓+(π‘₯βˆ’πœ‡)×𝑓+β‹―+(π‘₯βˆ’πœ‡)×𝑓𝑓+𝑓+𝑓+β‹―+𝑓=ο„ž45+8+0+7+365+2+7+7+4=ο„ž9625=√3.84=1.95959β€¦β‰ˆ1.960,ο—οŠ§οŠ¨οŠ§οŠ¨οŠ¨οŠ¨οŠ©οŠ¨οŠ©οŠοŠ¨οŠοŠ§οŠ¨οŠ©οŠ which is 1.960 to three decimal places.

Therefore, the standard deviation of the number of goals scored is 1.960 to three decimal places.

Next, we will discuss how to calculate the standard deviation of grouped data using the midpoint. This approach involves the same steps as with frequency tables, but we are dealing with intervals for our data set rather than a set of values; then, we need to use the midpoint in order to approximate the set of values. We will explore this further in our final example.

Example 7: Find the Standard Deviation of a Grouped Data Set

A quiz was completed by 92 students and their scores were recorded in the following frequency table. Find the standard deviation to two decimal places.

Score0<𝑠≀2020<𝑠≀4040<𝑠≀6060<𝑠≀8080<𝑠≀100
Frequency261024527

Answer

As the data presented in this question is in the form of a frequency table, in order to calculate the standard deviation πœŽο—, we use the formula 𝜎=ο„Ÿ(π‘₯βˆ’πœ‡)×𝑓+(π‘₯βˆ’πœ‡)×𝑓+(π‘₯βˆ’πœ‡)×𝑓+β‹―+(π‘₯βˆ’πœ‡)×𝑓𝑓+𝑓+𝑓+β‹―+𝑓=ο„‘ο„£ο„£ο„ βˆ‘(π‘₯βˆ’πœ‡)π‘“βˆ‘π‘“,ο—οŠ§οŠ¨οŠ§οŠ¨οŠ¨οŠ¨οŠ©οŠ¨οŠ©οŠοŠ¨οŠοŠ§οŠ¨οŠ©οŠοŠοƒοŠ²οŠ§οƒοŠ¨οƒοŠοƒοŠ²οŠ§οƒ where 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯} represents the values of the data set with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,…,𝑓}, there are 𝑛 distinct values of the data set, and the mean is represented by πœ‡.

For this type of problem, we have been given different β€œclasses” of values represented by intervals rather than exact values. This means we cannot directly apply the formula above, since we cannot substitute these intervals for the values of π‘₯ in our formula.

Instead, the approach we must take is to find the β€œmidpoint” of each interval and use this to represent the corresponding value of π‘₯. After doing so, we can treat the problem as we would with any other grouped frequency table.

To find the midpoint, we add together the endpoints and divide by 2. This allows us to find an approximate standard deviation of the data set.

So, the values of the data set are the midpoint of each of the scores obtained in the quiz and the corresponding frequencies are the frequencies for each of the values. Let’s find the midpoint of each of the intervals and then rewrite the midpoints as π‘₯ and the frequencies as 𝑓, as follows:

IntervalMidpoint π‘₯Frequency 𝑓
0<𝑠≀200+202=1026
20<𝑠≀4020+402=3010
40<𝑠≀6040+602=5024
60<𝑠≀8060+802=705
80<𝑠≀10080+1002=9027

To calculate the standard deviation, we must first calculate the mean, πœ‡. For a set of data 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯} with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,…,𝑓} and 𝑛 distinct values of the data set, we use the following formula: πœ‡=π‘₯𝑓+π‘₯𝑓+π‘₯𝑓+β‹―+π‘₯𝑓𝑓+𝑓+𝑓+β‹―+𝑓=βˆ‘π‘₯π‘“βˆ‘π‘“.οŠ§οŠ§οŠ¨οŠ¨οŠ©οŠ©οŠοŠοŠ§οŠ¨οŠ©οŠοŠοƒοŠ²οŠ§οƒοƒοŠοƒοŠ²οŠ§οƒ

Again, we remember that the midpoint is now being used to represent the values of π‘₯. Using the table above, we can add a new column in order to find π‘₯𝑓 for each value of 𝑖 and then use this to find the mean.

IntervalMidpoint π‘₯Frequency 𝑓π‘₯𝑓
0<𝑠≀200+202=102610Γ—26=260
20<𝑠≀4020+402=301030Γ—10=300
40<𝑠≀6040+602=502450Γ—24=1200
60<𝑠≀8060+802=70570Γ—5=350
80<𝑠≀10080+1002=902790Γ—27=2430

By summing the values for π‘₯𝑓 and dividing the sum of the frequencies, we get πœ‡=π‘₯𝑓+π‘₯𝑓+π‘₯𝑓+β‹―+π‘₯𝑓𝑓+𝑓+𝑓+β‹―+𝑓=260+300+1200+350+243026+10+24+5+27=454092β‰ˆ49.34782….

Next, we will calculate the difference between the midpoints of each class in our data set and the mean and the square of this in order to calculate the sum of the squares. We will do this by adding two further columns to the table above. Note that all values have been rounded to 4 decimal places.

IntervalMidpoint π‘₯Frequency 𝑓π‘₯𝑓π‘₯βˆ’πœ‡οƒ(π‘₯βˆ’πœ‡)οƒοŠ¨
0<𝑠≀200+202=102610Γ—26=26010βˆ’49.3478=βˆ’39.34781β€Žβ€‰β€Ž548.2494
20<𝑠≀4020+402=301030Γ—10=30030βˆ’49.3478=βˆ’19.3478374.3375
40<𝑠≀6040+602=502450Γ—24=120050βˆ’49.3478=0.65220.4254
60<𝑠≀8060+802=70570Γ—5=35070βˆ’49.3478=20.6522426.5134
80<𝑠≀10080+1002=902790Γ—27=243090βˆ’49.3478=40.65221β€Žβ€‰β€Ž652.6014

We now need to calculate the product of difference of squares of the mean and midpoints of the data and the frequencies of the values of the data set. We will add another column to the table to do this. Again, we will round to 4 decimal places.

IntervalMidpoint π‘₯Frequency 𝑓π‘₯𝑓π‘₯βˆ’πœ‡οƒ(π‘₯βˆ’πœ‡)οƒοŠ¨(π‘₯βˆ’πœ‡)π‘“οƒοŠ¨οƒ
0<𝑠≀200+202=102610Γ—26=26010βˆ’49.3478
=βˆ’39.3478
1β€Žβ€‰β€Ž548.2493640β€Žβ€‰β€Ž254.4818
20<𝑠≀4020+402=301030Γ—10=30030βˆ’49.3478
=βˆ’19.3478
374.3373653β€Žβ€‰β€Ž743.3736
40<𝑠≀6040+602=502450Γ—24=120050βˆ’49.3478
=0.6522
0.4253648410.2088
60<𝑠≀8060+802=70570Γ—5=35070βˆ’49.3478
=20.6522
426.5133652β€Žβ€‰β€Ž132.5668
80<𝑠≀10080+1002=902790Γ—27=243090βˆ’49.3478
=40.6522
1β€Žβ€‰β€Ž652.6013644β€Žβ€‰β€Ž620.2351

We are now ready to find the standard deviation. We will substitute the values from the table into the formula of the standard deviation and solve for πœŽο—: 𝜎=ο„Ÿ(π‘₯βˆ’πœ‡)×𝑓+(π‘₯βˆ’πœ‡)×𝑓+(π‘₯βˆ’πœ‡)×𝑓+β‹―+(π‘₯βˆ’πœ‡)×𝑓𝑓+𝑓+𝑓+β‹―+𝑓=ο„ž40254.4818+3743.3736+10.2088+2132.5668+44620.235126+10+24+5+27=ο„ž90760.866192=√986.5311…=31.4091…,ο—οŠ§οŠ¨οŠ§οŠ¨οŠ¨οŠ¨οŠ©οŠ¨οŠ©οŠοŠ¨οŠοŠ§οŠ¨οŠ©οŠ which is 31.41 when rounded to 2 decimal places.

Therefore, the standard deviation is 31.41 to 2 decimal places.

In this explainer, we have learned what the standard deviation is and how to find it for a set of data, from both a list and a frequency table. We have also learned how to compare data sets and draw conclusions using the standard deviation.

Key Points

  • The standard deviation of a data set is used to measure the dispersion of data from the mean.
  • For data presented in a list, the formula for the standard deviation πœŽο— of a set of data 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯} with 𝑛 members and mean πœ‡ is 𝜎=ο„Ÿβˆ‘(π‘₯βˆ’πœ‡)𝑛.ο—οŠοƒοŠ²οŠ§οƒοŠ¨
  • For data presented in a frequency table, the formula for the standard deviation πœŽο— of a data set 𝑋={π‘₯,π‘₯,π‘₯,…,π‘₯}, with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,…,𝑓}, 𝑛 distinct values of the data set, and mean πœ‡ is 𝜎=ο„‘ο„£ο„£ο„ βˆ‘(π‘₯βˆ’πœ‡)π‘“βˆ‘π‘“.ο—οŠοƒοŠ²οŠ§οƒοŠ¨οƒοŠοƒοŠ²οŠ§οƒ
  • For grouped frequency tables where data is given in intervals, the midpoint of the interval is used to represent the values of π‘₯.

Nagwa uses cookies to ensure you get the best experience on our website. Learn more about our Privacy Policy.