Lesson Explainer: Standard Deviation of a Data Set | Nagwa Lesson Explainer: Standard Deviation of a Data Set | Nagwa

Lesson Explainer: Standard Deviation of a Data Set Mathematics • Third Year of Preparatory School

Join Nagwa Classes

Attend live Mathematics sessions on Nagwa Classes to learn more about this topic from an expert teacher!

In this explainer, we will learn how to find and interpret the standard deviation from a given data set.

In order to understand the meaning of the standard deviation of a data set we first recall the definition of the mean of a data set.

Definition: The Mean of a Data Set

The mean, average, or expected value of a data set is used as measure of central tendency. For a data set 𝑋={𝑥,𝑥,𝑥,,𝑥}, where there are 𝑛 values, the mean, denoted by 𝜇 (pronounced “miu”) or 𝑥, is calculated by taking the sum of the data set and dividing it by the number of values 𝑛, as indicated in the formula below: 𝜇=𝑥+𝑥+𝑥++𝑥𝑛=𝑥𝑛.

The standard deviation of a data set tells us the dispersion of data from the mean. The larger the standard deviation, the more dispersed the data is from the mean, and the smaller the standard deviation, the less dispersed the data is from the mean.

The square of the standard deviation is called the variance and is another measure of dispersion. A further measure of dispersion is the interquartile range, which is the difference between the upper quartile and the lower quartile, or the value of the 75th percentile minus the value of the 25th percentile. In this explainer, we will only be focusing on the standard deviation as a measure of dispersion.

The standard deviation is more formally defined in the definition below.

Definition: The Standard Deviation of a Data Set

The standard deviation of a data set is used to measure the dispersion of data from the mean. For a data set 𝑋={𝑥,𝑥,𝑥,,𝑥}, where there are 𝑛 values, the standard deviation, denoted by 𝜎 (pronounced “sigma 𝑥”), is calculated by taking the sum of the difference of values of the data set from the mean 𝜇 squared, dividing by the number of values, and square rooting, as indicated in the formula below: 𝜎=(𝑥𝜇)+(𝑥𝜇)+(𝑥𝜇)++(𝑥𝜇)𝑛=(𝑥𝜇)𝑛.

Another way of describing the standard deviation is as the average distance between the mean and the individual data points in the set. So, if the standard deviation is larger, then the average distance between the mean and the individual data points will be greater, meaning they are more dispersed. Similarly, if the standard deviation is smaller, then the distance between the mean and the individual data points will be less, meaning they are less dispersed.

We will use the definition of the standard deviation of a data set to answer the first example.

Example 1: Understanding Standard Deviation

What is the name of a quantity expressing by how much the members of a group differ from the mean value for the group?

Answer

We know that the standard deviation of a data set determines how dispersed the data set is from the mean. This can also be described as how much the members of a data set differ from the mean of the data set.

Therefore, the quantity expressing by how much the members of a group differ from the mean value for the group is the standard deviation. A low standard deviation tells us that the data points are, on average, closer to the mean, and a high standard deviation tells us that the data points are, on average, further from the mean.

Having discussed what the definition of the standard deviation is, we will next consider the case where the measure of dispersion is zero, as seen in the next example.

Example 2: Identifying a Set of Values with Zero Dispersion

If the dispersion of a set of values is equal to zero, then which of the following is true?

  1. The difference between the individual values is great.
  2. The difference between the individual values is small.
  3. All the values are equal.
  4. The arithmetic mean of these values is zero.
  5. All the values are negative.

Answer

The dispersion of a data set can be measured using the standard deviation, denoted 𝜎. For a data set 𝑋={𝑥,𝑥,𝑥,,𝑥}, with 𝑛 values and a mean 𝜇, this is calculated by using the following formula: 𝜎=(𝑥𝜇)+(𝑥𝜇)+(𝑥𝜇)++(𝑥𝜇)𝑛.

If the dispersion of a data set is equal to zero, then the standard deviation is equal to zero. By setting the formula for the standard deviation equal to zero, we get 𝜎=(𝑥𝜇)+(𝑥𝜇)+(𝑥𝜇)++(𝑥𝜇)𝑛=0.

By squaring both sides, we get (𝑥𝜇)+(𝑥𝜇)+(𝑥𝜇)++(𝑥𝜇)𝑛=0(𝑥𝜇)+(𝑥𝜇)+(𝑥𝜇)++(𝑥𝜇)𝑛=0.

Then, by multiplying both sides by 𝑛, we have (𝑥𝜇)+(𝑥𝜇)+(𝑥𝜇)++(𝑥𝜇)=0×𝑛(𝑥𝜇)+(𝑥𝜇)+(𝑥𝜇)++(𝑥𝜇)=0.

Now, if we square any real number greater than zero, then we get a value greater than zero. Also, if we square any real number less than zero, then we still get a value greater than zero. So, the brackets must each equal zero for the result to be zero: (𝑥𝜇)+(𝑥𝜇)+(𝑥𝜇)++(𝑥𝜇)=0.=0=0=0=0

So, equaling each bracket to zero gives us 𝑥𝜇=0,𝑥𝜇=0,𝑥𝜇=0,𝑥𝜇=0.

When solving for 𝑥,𝑥,𝑥,,𝑥, we get 𝑥=𝜇,𝑥=𝜇,𝑥=𝜇,𝑥=𝜇.

Therefore, all of the members of the data set 𝑋 are equal to the mean 𝜇 and are equal, which is option C.

In the next example, we will use the formula for the standard deviation of a data set to determine the standard deviation when given the sum of the squares of the differences and the number of data points.

Example 3: Calculating Standard Deviation

If 𝑥𝑥 for a set of 6 values equals 25, find the standard deviation of the set, and round the result to the nearest thousandth.

Answer

To calculate the standard deviation of a set of data, we first recall the formula 𝜎=𝑥𝑥𝑛, where 𝜎 denotes the standard deviation of the set of data 𝑋, 𝑋={𝑥,𝑥,𝑥,,𝑥}, 𝑛 is the number of members of the data set, and 𝑥 is the mean of the data set.

We are told 𝑥𝑥=25, which is the same as saying 𝑥𝑥=25. We are also told there are 6 values, which indicates that 𝑛=6.

By substituting 𝑥𝑥=25 and 𝑛=6 and solving for 𝜎, we get 𝜎=𝑥𝑥𝑛=256=2.0412412.041.

Our answer is therefore 2.041 when rounded to the nearest thousandth.

Next, we will discuss how to find the standard deviation of a data set. We will explore this in detail below.

When calculating the standard deviation of a set of data, we need to execute a number of steps when working with the formula. First, let’s recall the formula 𝜎=(𝑥𝜇)+(𝑥𝜇)+(𝑥𝜇)++(𝑥𝜇)𝑛=(𝑥𝜇)𝑛, where 𝜎 denotes the standard deviation of the set of data 𝑋, 𝑋={𝑥,𝑥,𝑥,,𝑥}, 𝑛 is the number of members of the data set, and 𝜇 is the mean of the data set.

To help demonstrate how to use the formula, we will use the following data set: 𝑋={1,1,3,5,7}.

We will next execute the following steps using this data set to illustrate how the steps work.

Step 1: Finding the mean

As we need to calculate the difference between the mean and the members of the set within the brackets of the formula, we need to start by calculating the mean. This is 𝜇=𝑥+𝑥+𝑥++𝑥𝑛=𝑥𝑛, where 𝜇 denotes the mean, 𝑋={𝑥,𝑥,𝑥,,𝑥} is the data set, and 𝑛 is the number of points in the data set.

For the data set 𝑋={1,1,3,5,7}, this gives us 𝜇=𝑥+𝑥+𝑥++𝑥𝑛=1+1+3+5+75=175=3.4.

Step 2: Finding the difference between the mean and each of the data points

In order to calculate (𝑥𝜇) in the formula, we need to calculate 𝑥𝜇 for all values of 𝑖=1,,𝑛, or, in other words, the difference between the mean and each of the data points. For this step and subsequent steps, it is helpful to lay this out in a table.

𝑥𝑥𝜇
113.4=2.4
113.4=2.4
333.4=0.4
553.4=1.6
773.4=3.6

Step 3: Finding the sum of the squares of the difference between the mean and each of the data points

Following on from step 2, in order to calculate (𝑥𝜇) in the formula, we next need to calculate (𝑥𝜇) for all values of 𝑖=1,,𝑛 and sum this. In other words, we need to square the difference between the mean and each of the data points and sum these. We will use the table from step 2 and add a further column.

𝑥𝑥𝜇(𝑥𝜇)
113.4=2.4(2.4)=5.76
113.4=2.4(2.4)=5.76
333.4=0.4(0.4)=0.16
553.4=1.6(1.6)=2.56
773.4=3.6(3.6)=12.96

Summing the last column, we get (𝑥𝜇)=5.76+5.76+0.16+2.56+12.96=27.2.

Step 4: Substituting into the formula and finding the standard deviation

For the final step, we substitute the sum of squares and 𝑛 in the formula and then calculate the value of the standard deviation.

From step 3, we found (𝑥𝜇)=27.2, and we know 𝑛=5. Therefore, by substituting into the formula for 𝜎 and solving, we get 𝜎=(𝑥𝜇)𝑛=27.25=5.44=2.33232.33, which is the standard deviation for the data set 𝑋={1,1,3,5,7}.

We can summarize these steps as follows.

How To: Finding the Standard Deviation of a Data Set

Step 1: Finding the mean of the data set

Step 2: Finding the difference between the mean and the value of each of the data points

Step 3: Finding the sum of the squares of the difference between the mean and the value of each of the data points

Step 4: Substituting the sum of the squares and 𝑛 into the formula and square rooting in order to calculate the standard deviation (This should always be positive.)

In the next example, we will use this process to calculate the standard deviation of a data set.

Example 4: Calculating the Standard Deviation of a Data Set

Calculate the standard deviation of the values 45, 35, 42, 49, 39, and 34. Give your answer to 3 decimal places.

Answer

To find the standard deviation of a set of data, we use the formula 𝜎=(𝑥𝜇)𝑛, where 𝜎 denotes the standard deviation of the set of data 𝑋, 𝑋={𝑥,𝑥,𝑥,,𝑥}, 𝑛 is the number of members of the data set, and 𝜇 is the mean of the data set.

First, we will calculate the mean, 𝜇, of the data set. Recall the formula for the mean, which is 𝜇=𝑥+𝑥+𝑥++𝑥𝑛.

In this case, the data set 𝑋 is {45,35,42,49,39,34} and the number of members of the data set is 6. So, by substituting {45,35,42,49,39,34} for {𝑥,𝑥,𝑥,,𝑥} and 6 for 𝑛, we get 𝜇=45+35+42+49+39+346=2446=40.̇6.

Next, we will calculate 𝑥𝜇 for each member of the data set. To help ourselves do this, we will lay the data out in a table as follows:

𝑥𝑥𝜇
454540.̇6=4.̇3
353540.̇6=5.̇6
424240.̇6=1.̇3
494940.̇6=8.̇3
393940.̇6=1.̇6
343440.̇6=6.̇6

Following this, we can now calculate (𝑥𝜇). To do this, we will square 𝑥𝜇 for each member of the data set and then sum all the data. We will add another column to the table above for ease of calculation.

𝑥𝑥𝜇(𝑥𝜇)
454540.̇6=4.̇34.̇3=18.̇7
353540.̇6=5.̇65.̇6=32.̇1
424240.̇6=1.̇31.̇3=1.̇7
494940.̇6=8.̇38.̇3=69.̇4
393940.̇6=1.̇61.̇6=2.̇7
343440.̇6=6.̇66.̇6=44.̇4

When we sum (𝑥𝜇) for each member of the data set, we get (𝑥𝜇)=18.̇7+32.̇1+1.̇7+69.̇4+2.̇7+44.̇4=169.̇3.

We can now substitute (𝑥𝜇)=169.̇3 and 𝑛=6 back into the original formula for the standard deviation and solve for 𝜎: 𝜎=(𝑥𝜇)𝑛=169.̇36=28.̇25.312459.

Therefore, the answer is 5.312 when rounded to 3 decimal places.

So, the standard deviation for the data set is 5.312 correct to three decimal places.

In the next example, we will discuss which data set among three data sets has the largest dispersion by using the standard deviation.

Example 5: Selecting a Data Set with the Highest Standard Deviation

By calculating the standard deviation, determine which of the sets {17,20,6,13}, {5,16,5,9}, and {1,6,20,1} has the largest dispersion.

Answer

To find the standard deviation of each of the data sets, we use the formula 𝜎=(𝑥𝜇)𝑛, where 𝜎 denotes the standard deviation of the set of data 𝑋, 𝑋={𝑥,𝑥,𝑥,,𝑥}, 𝑛 is the number of members of the data set, and 𝜇 is the mean of the data set.

We can see that each data set has four members, so 𝑛 is 4 for each case.

We will find the standard deviation of each data set first, then compare these in order to determine which has the largest dispersion.

For {17,20,6,13}, we will first find the mean, 𝜇, of the data set. Recall the formula for the mean, which is 𝜇=𝑥+𝑥+𝑥++𝑥𝑛.

Therefore, by substituting {17,20,6,13} for {𝑥,𝑥,𝑥,𝑥} and 4 for 𝑛, we get 𝜇=17+20+6+(13)4=44=1.

Next, we will calculate 𝑥𝜇 for each member of the data set. To help ourselves do this, we will lay the data out in a table as follows:

𝑥𝑥𝜇
1717(1)=16
2020(1)=21
66(1)=7
1313(1)=12

Following this, we can now calculate (𝑥𝜇). To do this, we will square 𝑥𝜇 for each member of the data set and then sum all the data. We will add another column to the table above for ease of calculation.

𝑥𝑥𝜇(𝑥𝜇)
1717(1)=16(16)=256
2020(1)=21(21)=441
66(1)=7(7)=49
1313(1)=12(12)=144

When we sum (𝑥𝜇) for each member of the data set, we get (𝑥𝜇)=256+441+49+144=890.

We can now substitute (𝑥𝜇)=890 and 𝑛=4 back into the original formula for the standard deviation and solve for 𝜎: 𝜎=(𝑥𝜇)𝑛=8904=222.5=14.9164.

We will now repeat these steps for the other two data sets.

For {5,16,5,9}, the mean is 𝜇=𝑥+𝑥+𝑥++𝑥𝑛=5+(16)+5+94=74=1.75.

To calculate (𝑥𝜇), we will find 𝑥𝜇 and (𝑥𝜇) for each member of the data set. We will lay this out in a table as before.

𝑥𝑥𝜇(𝑥𝜇)
55(1.75)=3.25(3.25)=10.5625
1616(1.75)=14.25(14.25)=203.0625
55(1.75)=6.75(6.75)=45.5625
99(1.75)=10.75(10.75)=115.5625

Summing (𝑥𝜇) for each member of the data set, we get (𝑥𝜇)=10.5625+203.0625+45.5625+115.5625=374.75.

Substituting (𝑥𝜇)=374.75 and 𝑛=4 back into the original formula for the standard deviation and solving for 𝜎, we get 𝜎=(𝑥𝜇)𝑛=374.754=93.6875=9.6792.

For the last data set, {1,6,20,1}, the mean is 𝜇=𝑥+𝑥+𝑥++𝑥𝑛=1+(6)+20+(1)4=124=3.

To calculate (𝑥𝜇), we will find 𝑥𝜇 and (𝑥𝜇) for each member of the data set. We will lay this out in a table as before.

𝑥𝑥𝜇(𝑥𝜇)
113=4(4)=16
663=9(9)=81
20203=17(17)=289
113=4(4)=16

Summing (𝑥𝜇) for each member of the data set, we get (𝑥𝜇)=16+81+289+16=402.

Substituting (𝑥𝜇)=402 and 𝑛=4 back into the original formula for the standard deviation and solving for 𝜎, we get 𝜎=(𝑥𝜇)𝑛=4024=100.5=10.0249.

We have found the standard deviation for each of the data sets. Let’s summarize this below:

  • For {17,20,6,13}, 𝜎=14.91 correct to 2 decimal places.
  • For {5,16,5,9}, 𝜎=9.68 correct to 2 decimal places.
  • For {1,6,20,1}, 𝜎=10.02 correct to 2 decimal places.

By comparing these data sets, we can see the first one, {17,20,6,13}, has the largest standard deviation.

Therefore {17,20,6,13} has the largest dispersion, since the standard deviation is a measure of dispersion.

So far, we have found the standard deviation of a set of data where the data has been presented in a list. Next, we will learn how to find the standard deviation from data set that is presented in a frequency table.

To find the standard deviation of a data set where the data is presented in a frequency table, we need to consider the frequency of the values in the data set as well as the values in the data set itself. One way of doing this could be to list the values. For example, consider the following data set:

𝑥𝑓
31
47
53

We could write this as one 3, seven 4s, and three 5s or 3,4,4,4,4,4,4,4,5,5,5 in order to calculate the standard deviation, as previously discussed. The problem with this approach is when there are high frequencies of data points (say 100 or even 1‎ ‎000), as we would have to write this out in a very long list. As such, it is more efficient to calculate squares of the differences in each data set and then multiply this by the corresponding frequency (much in the same way we would calculate the mean of a set of data in a frequency table).

Before considering the formula and method for finding the standard deviation of a set of data in a frequency table, we will first recall how to calculate the mean of a set of data from a frequency table.

Definition: The Mean of a Data Set in a Frequency Table

For a data set 𝑋={𝑥,𝑥,𝑥,,𝑥}, with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,,𝑓} and 𝑛 distinct values of the data set, the mean 𝜇 is calculated as follows: 𝜇=𝑥𝑓+𝑥𝑓+𝑥𝑓++𝑥𝑓𝑓+𝑓+𝑓++𝑓=𝑥𝑓𝑓.

Another way to represent this is in a table with the values of the data set in the first column, their corresponding frequencies in the second column, the multiplication of the data point and frequency in the third column, and the sums in the last row of the table. The mean can then be calculated by dividing the sum of the third column by the sum of the second column.

𝑥𝑓𝑥𝑓
𝑥𝑓𝑥𝑓
𝑥𝑓𝑥𝑓
𝑥𝑓𝑥𝑓
𝑥𝑓𝑥𝑓
𝑓𝑥𝑓

Having recapped the mean of a data set in a frequency table, we will next discuss the standard deviation. The formula for this is as follows.

Definition: The Standard Deviation of a Data Set in a Frequency Table

For a data set 𝑋={𝑥,𝑥,𝑥,,𝑥}, with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,,𝑓}, 𝑛 distinct values of the data set, and mean 𝜇, the standard deviation 𝜎 is calculated as follows: 𝜎=(𝑥𝜇)×𝑓+(𝑥𝜇)×𝑓+(𝑥𝜇)×𝑓++(𝑥𝜇)×𝑓𝑓+𝑓+𝑓++𝑓=(𝑥𝜇)𝑓𝑓.

The approach for finding the standard deviation of a data set is generally the same as the approach for finding the standard deviation of a data set in a frequency table; however, there are some important differences. As we are working with frequencies, we need to multiply each value in the data by its corresponding frequency when calculating the mean. Also, when calculating the sum of the squares of the difference between the mean and each different value of the data, we also need to multiply by the frequency.

In the next example, we will discuss how to find the standard deviation of a data set that is in a frequency table.

Example 6: Determining the Standard Deviation of a Data Set

The table shows the distribution of goals scored in the first half of a football season.

Number of Goals01346
Number of Games52774

Find the standard deviation of the number of goals scored. Give your answer to three decimal places.

Answer

As the data presented in this question is in the form of a frequency table, in order to calculate the standard deviation 𝜎, we use the formula 𝜎=(𝑥𝜇)×𝑓+(𝑥𝜇)×𝑓+(𝑥𝜇)×𝑓++(𝑥𝜇)×𝑓𝑓+𝑓+𝑓++𝑓=(𝑥𝜇)𝑓𝑓, where 𝑋={𝑥,𝑥,𝑥,,𝑥} represents the values of the data set with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,,𝑓}, there are 𝑛 distinct values of the data set, and the mean is represented by 𝜇.

In this question, the values of the data set are the number of goals scored in the first half of a football season. The number of games refers to the frequency with which each of these goals was scored. Let’s rewrite this using 𝑥 and 𝑓 as the headings and by transposing the table, as follows:

𝑥𝑓
05
12
37
47
64

To calculate the standard deviation, we must first calculate the mean 𝜇. For a set of data 𝑋={𝑥,𝑥,𝑥,,𝑥} with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,,𝑓} and 𝑛 distinct values of the data set, we use the following formula: 𝜇=𝑥𝑓+𝑥𝑓+𝑥𝑓++𝑥𝑓𝑓+𝑓+𝑓++𝑓=𝑥𝑓𝑓.

Using the table above, we can add a new column in order to find 𝑥𝑓 for each value of 𝑖 and then use this to find the mean.

𝑥𝑓𝑥𝑓
050×5=0
121×2=2
373×7=21
474×7=28
646×4=24

By summing the values for 𝑥𝑓 and dividing the sum of the frequencies, we get 𝜇=𝑥𝑓+𝑥𝑓+𝑥𝑓++𝑥𝑓𝑓+𝑓+𝑓++𝑓=0+2+21+28+245+2+7+7+4=7525=3.

Next, we will calculate the difference between each value of the data set and the mean and the square of this in order to calculate the sum of the squares. We will do this by adding two further columns to the table above.

𝑥𝑓𝑥𝑓𝑥𝜇(𝑥𝜇)
050×5=003=3(3)=9
121×2=213=2(2)=4
373×7=2133=00=0
474×7=2843=11=1
646×4=2463=33=9

We now need to calculate the product of squares of the differences of the mean and values of the data and the frequencies of the values of the data set. We will add another column to the table to do this.

𝑥𝑓𝑥𝑓𝑥𝜇(𝑥𝜇)(𝑥𝜇)𝑓
050×5=003=3(3)=99×5=45
121×2=213=2(2)=44×2=8
373×7=2133=00=00×7=0
474×7=2843=11=11×7=7
646×4=2463=33=99×4=36

We are now ready to find the standard deviation. We will substitute the values from the table into the formula of the standard deviation and solve for 𝜎: 𝜎=(𝑥𝜇)×𝑓+(𝑥𝜇)×𝑓+(𝑥𝜇)×𝑓++(𝑥𝜇)×𝑓𝑓+𝑓+𝑓++𝑓=45+8+0+7+365+2+7+7+4=9625=3.84=1.959591.960, which is 1.960 to three decimal places.

Therefore, the standard deviation of the number of goals scored is 1.960 to three decimal places.

Next, we will discuss how to calculate the standard deviation of grouped data using the midpoint. This approach involves the same steps as with frequency tables, but we are dealing with intervals for our data set rather than a set of values; then, we need to use the midpoint in order to approximate the set of values. We will explore this further in our final example.

Example 7: Find the Standard Deviation of a Grouped Data Set

A quiz was completed by 92 students and their scores were recorded in the following frequency table. Find the standard deviation to two decimal places.

Score0<𝑠2020<𝑠4040<𝑠6060<𝑠8080<𝑠100
Frequency261024527

Answer

As the data presented in this question is in the form of a frequency table, in order to calculate the standard deviation 𝜎, we use the formula 𝜎=(𝑥𝜇)×𝑓+(𝑥𝜇)×𝑓+(𝑥𝜇)×𝑓++(𝑥𝜇)×𝑓𝑓+𝑓+𝑓++𝑓=(𝑥𝜇)𝑓𝑓, where 𝑋={𝑥,𝑥,𝑥,,𝑥} represents the values of the data set with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,,𝑓}, there are 𝑛 distinct values of the data set, and the mean is represented by 𝜇.

For this type of problem, we have been given different “classes” of values represented by intervals rather than exact values. This means we cannot directly apply the formula above, since we cannot substitute these intervals for the values of 𝑥 in our formula.

Instead, the approach we must take is to find the “midpoint” of each interval and use this to represent the corresponding value of 𝑥. After doing so, we can treat the problem as we would with any other grouped frequency table.

To find the midpoint, we add together the endpoints and divide by 2. This allows us to find an approximate standard deviation of the data set.

So, the values of the data set are the midpoint of each of the scores obtained in the quiz and the corresponding frequencies are the frequencies for each of the values. Let’s find the midpoint of each of the intervals and then rewrite the midpoints as 𝑥 and the frequencies as 𝑓, as follows:

IntervalMidpoint 𝑥Frequency 𝑓
0<𝑠200+202=1026
20<𝑠4020+402=3010
40<𝑠6040+602=5024
60<𝑠8060+802=705
80<𝑠10080+1002=9027

To calculate the standard deviation, we must first calculate the mean, 𝜇. For a set of data 𝑋={𝑥,𝑥,𝑥,,𝑥} with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,,𝑓} and 𝑛 distinct values of the data set, we use the following formula: 𝜇=𝑥𝑓+𝑥𝑓+𝑥𝑓++𝑥𝑓𝑓+𝑓+𝑓++𝑓=𝑥𝑓𝑓.

Again, we remember that the midpoint is now being used to represent the values of 𝑥. Using the table above, we can add a new column in order to find 𝑥𝑓 for each value of 𝑖 and then use this to find the mean.

IntervalMidpoint 𝑥Frequency 𝑓𝑥𝑓
0<𝑠200+202=102610×26=260
20<𝑠4020+402=301030×10=300
40<𝑠6040+602=502450×24=1200
60<𝑠8060+802=70570×5=350
80<𝑠10080+1002=902790×27=2430

By summing the values for 𝑥𝑓 and dividing the sum of the frequencies, we get 𝜇=𝑥𝑓+𝑥𝑓+𝑥𝑓++𝑥𝑓𝑓+𝑓+𝑓++𝑓=260+300+1200+350+243026+10+24+5+27=45409249.34782.

Next, we will calculate the difference between the midpoints of each class in our data set and the mean and the square of this in order to calculate the sum of the squares. We will do this by adding two further columns to the table above. Note that all values have been rounded to 4 decimal places.

IntervalMidpoint 𝑥Frequency 𝑓𝑥𝑓𝑥𝜇(𝑥𝜇)
0<𝑠200+202=102610×26=2601049.3478=39.34781‎ ‎548.2494
20<𝑠4020+402=301030×10=3003049.3478=19.3478374.3375
40<𝑠6040+602=502450×24=12005049.3478=0.65220.4254
60<𝑠8060+802=70570×5=3507049.3478=20.6522426.5134
80<𝑠10080+1002=902790×27=24309049.3478=40.65221‎ ‎652.6014

We now need to calculate the product of squares of the differences of the mean and midpoints of the data and the frequencies of the values of the data set. We will add another column to the table to do this. Again, we will round to 4 decimal places.

IntervalMidpoint 𝑥Frequency 𝑓𝑥𝑓𝑥𝜇(𝑥𝜇)(𝑥𝜇)𝑓
0<𝑠200+202=102610×26=2601049.3478
=39.3478
1‎ ‎548.2493640‎ ‎254.4818
20<𝑠4020+402=301030×10=3003049.3478
=19.3478
374.3373653‎ ‎743.3736
40<𝑠6040+602=502450×24=12005049.3478
=0.6522
0.4253648410.2088
60<𝑠8060+802=70570×5=3507049.3478
=20.6522
426.5133652‎ ‎132.5668
80<𝑠10080+1002=902790×27=24309049.3478
=40.6522
1‎ ‎652.6013644‎ ‎620.2351

We are now ready to find the standard deviation. We will substitute the values from the table into the formula of the standard deviation and solve for 𝜎: 𝜎=(𝑥𝜇)×𝑓+(𝑥𝜇)×𝑓+(𝑥𝜇)×𝑓++(𝑥𝜇)×𝑓𝑓+𝑓+𝑓++𝑓=40254.4818+3743.3736+10.2088+2132.5668+44620.235126+10+24+5+27=90760.866192=986.5311=31.4091, which is 31.41 when rounded to 2 decimal places.

Therefore, the standard deviation is 31.41 to 2 decimal places.

In this explainer, we have learned what the standard deviation is and how to find it for a set of data, from both a list and a frequency table. We have also learned how to compare data sets and draw conclusions using the standard deviation.

Key Points

  • The standard deviation of a data set is used to measure the dispersion of data from the mean.
  • For data presented in a list, the formula for the standard deviation 𝜎 of a set of data 𝑋={𝑥,𝑥,𝑥,,𝑥} with 𝑛 members and mean 𝜇 is 𝜎=(𝑥𝜇)𝑛.
  • For data presented in a frequency table, the formula for the standard deviation 𝜎 of a data set 𝑋={𝑥,𝑥,𝑥,,𝑥}, with corresponding frequencies 𝐹={𝑓,𝑓,𝑓,,𝑓}, 𝑛 distinct values of the data set, and mean 𝜇 is 𝜎=(𝑥𝜇)𝑓𝑓.
  • For grouped frequency tables where data is given in intervals, the midpoint of the interval is used to represent the values of 𝑥.

Join Nagwa Classes

Attend live sessions on Nagwa Classes to boost your learning with guidance and advice from an expert teacher!

  • Interactive Sessions
  • Chat & Messaging
  • Realistic Exam Questions

Nagwa uses cookies to ensure you get the best experience on our website. Learn more about our Privacy Policy