Video Transcript
In this video, we will learn how to
find and interpret the standard deviation from a given data set.
In order to understand the meaning
of the standard deviation of a data set, we first recall the definition of the mean
of a data set. The mean, also known as the average
or expected value of a data set, is used as a measure of central tendency. For a data set 𝑥 containing values
𝑥 sub one, 𝑥 sub two, 𝑥 sub three, and so on, up to 𝑥 sub 𝑛 where there are
𝑛-values, the mean denoted by the Greek letter 𝜇 is calculated by taking the sum
of the data set and dividing it by the number of values 𝑛. This can be written as the sum from
𝑖 equals one to 𝑛 of 𝑥 sub 𝑖 all divided by 𝑛.
Let’s now define what we mean by
the standard deviation. The standard deviation of a data
set is used to measure the dispersion of data from the mean. The larger the standard deviation,
the more dispersed the data is from the mean. And the smaller the standard
deviation, the less dispersed the data is from the mean. For the same data set 𝑥 with
values 𝑥 sub one, 𝑥 sub two, and so on, up to 𝑥 sub 𝑛 where there are 𝑛-values,
the standard deviation denoted by 𝜎 𝑥 is calculated as follows. We find the square root of the sum
of the difference of values of the data set from the mean 𝜇 all squared divided by
the number of values 𝑛. This can be simplified as
shown. The two shorthand formulae to
calculate the mean in standard deviation will be key in solving the examples in this
video.
We will begin by using the formula
for standard deviation of a data set to determine the standard deviation when given
the sum of the difference of squares and the number of data points.
If the sum of 𝑥 minus 𝑥 bar
all squared for a set of six values equals 25, find the standard deviation of
the set, and round the result to the nearest thousandth.
We begin by recalling what some
of the notation in the question means. 𝑥 bar, also sometimes written
as the Greek letter 𝜇, is the mean of the data set. We are asked to find the
standard deviation of the set. This is denoted 𝜎 𝑥 and
satisfies the equation shown. 𝑛 is the number of values in
the data set, in this question six. And we’re also told that the
sum of 𝑥 minus 𝑥 bar all squared is equal to 25. Substituting these values, we
see that the standard deviation 𝜎 𝑥 is equal to the square root of 25 over
six. Typing this into our calculator
gives us an answer of 2.041241 and so on. We are asked to give the result
to the nearest thousandth. So we need to round to three
decimal places. And the standard deviation is
therefore equal to 2.041.
In this question, we were given the
sum of the difference of values of the data set from the mean all squared. However, in general, we will just
be given the data set. Our next step will therefore be to
consider the four-step process we can use to find the standard deviation of a data
set.
We begin by recalling the formula
to calculate the standard deviation 𝜎 𝑥 that we have already seen. When given a data set, our first
step is to find the mean 𝜇 or 𝑥 bar of the data set. Our second step is to find the
difference between the mean and the value of each of the data points. Next, we find the sum of the
squares of each of the values we found in step two. Finally, we substitute the sum of
the squares and the value of 𝑛 into the formula and then square root to calculate
the standard deviation, noting that this value will always be positive. We will now look at an example
where we need to follow this four-step process.
Calculate the standard
deviation of the values 45, 35, 42, 49, 39, and 34. Give your answer to three
decimal places.
We begin by recalling that the
formula to calculate the standard deviation 𝜎 𝑥 of a data set is as shown,
where 𝑛 is the number of members of the data set and 𝜇 is its mean. We recall that we can calculate
the mean of a data set by finding the sum of the values and dividing by how many
values there are. The mean 𝜇 in this case is
equal to the sum of the six values divided by six. This is equal to 224 divided by
six, which equals 40.6 recurring. We will now set up a table
which will enable us to follow a step-by-step process to calculate the standard
deviation.
In the first row of our table,
we have the six values in our data set 𝑥 sub 𝑖. We begin by subtracting the
mean 𝜇 from each of these values. 45 minus 40.6 recurring is
equal to 4.3 recurring. Subtracting the mean from 35
gives us negative 5.6 recurring. Repeating this process for the
other four values in our data set, we have 1.3 recurring, 8.3 recurring,
negative 1.6 recurring, and negative 6.6 recurring. Our next step is to find the
square of each of these values. Noting that all of these must
be positive, we have the six values shown. We are now in a position to
find the sum of 𝑥 sub 𝑖 minus 𝜇 all squared from 𝑖 equals one to 𝑖 equals
six. This is the sum of the six
values in the third row.
Typing this into our calculator
gives us 169.3 recurring. The standard deviation 𝜎 𝑥 is
therefore equal to the square root of 169.3 recurring divided by six, which is
equal to 5.312459 and so on. As we are asked to give our
answer to three decimal places, we can conclude that the standard deviation of
the values 45, 35, 42, 49, 39, and 34 is 5.312.
Before looking at one final
example, we will consider how we can calculate the mean and standard deviation of a
data set in a frequency table. For a data set 𝑥 containing values
𝑥 sub one, 𝑥 sub two, and so on, up to 𝑥 sub 𝑛, with corresponding frequencies
𝑓 equal to 𝑓 sub one, 𝑓 sub two, and so on and 𝑛 distinct values of the data
set, the mean 𝜇 is calculated as follows. It is the sum of 𝑥 sub 𝑖 𝑓 sub
𝑖 from 𝑖 equals one to 𝑛 divided by the sum of 𝑓 sub 𝑖 from 𝑖 equals one to
𝑛. When answering any questions of
this type, we’ll need to add a row to our table containing the values of 𝑥 sub 𝑖
multiplied by 𝑓 sub 𝑖.
We can then use this value of the
mean to calculate the standard deviation in a similar way. The standard deviation 𝜎 𝑥 is
equal to the square root of the sum of 𝑥 sub 𝑖 minus 𝜇 all squared multiplied by
𝑓 sub 𝑖 from 𝑖 equals one to 𝑛 divided by the sum of 𝑓 sub 𝑖 from 𝑖 equals
one to 𝑛. After we find the square of the
differences, we need to multiply each of these values by the frequency before
finding their sum. Let’s now look at an example of
this type.
The table shows the
distribution of goals scored in the first half of a football season. Find the standard deviation of
the number of goals scored. Give your answer to three
decimal places.
We can see from the table that
in five matches, there were no goals scored in the first half. In two matches, one goal was
scored. There were seven matches, and
both three and four goals were scored. And there were four matches
where six goals were scored in the first half. We are asked to find the
standard deviation of the number of goals scored. And this can be calculated
using the following formula when a data set is given in a frequency table. In this question, 𝑥 sub 𝑖
will be the number of goals. 𝑓 sub 𝑖 will be the number of
matches. And 𝜇 will be the mean number
of goals scored per match.
This mean value can be
calculated by finding the sum of 𝑥 sub 𝑖 multiplied by 𝑓 sub 𝑖 from 𝑖
equals one to 𝑛 divided by the sum of 𝑓 sub 𝑖 from 𝑖 equals one to 𝑛. Before using either of our
formulae, we will add some extra rows to our table. In order to calculate the mean,
we begin by multiplying each value of 𝑥 sub 𝑖 by the corresponding value of 𝑓
sub 𝑖. Multiplying zero goals by five
matches gives us a total of zero goals. One multiplied by two is equal
to two. Completing this row, we obtain
values of 21, 28, and 24. Adding an extra column for the
sum, we need to find this value from 𝑖 equals one to 𝑛 for the second and
third rows.
The sum of the frequencies is
25, which means there were a total of 25 matches played. This will be the denominator
when calculating both the mean and the standard deviation. Adding zero, two, 21, 28, and
24 gives us 75. And the mean is therefore equal
to 75 divided by 25. The average or mean number of
goals scored per match was three. In the fourth row of our table,
we will subtract this mean from each of our 𝑥-values. Zero minus three is equal to
negative three. And subtracting 𝜇 from each of
the other 𝑥-values gives us negative two, zero, one, and three. Our next step is to square all
five of these values. Noting that squaring a negative
number gives a positive answer, we have nine, four, zero, one, and nine.
Finally, we need to multiply
each of these values by the corresponding frequency. Nine multiplied by five is
45. Next, we multiply four by two
to give us eight. Our last three values are zero,
seven, and 36. We now need to find the sum of
the values in the bottom row. And this is equal to 96. The standard deviation 𝜎 𝑥 is
therefore equal to the square root of 96 over 25. This simplifies to the square
root of 3.84. And recalling that we were
asked to give our answer to three decimal places, we can type this into our
calculator. To three decimal places, the
standard deviation of the number of goals scored is 1.960.
Whilst we will not cover it in this
video, it is worth noting that we can also find the standard deviation of a group
data set in a similar way. When dealing with a group data set,
we find the midpoint of each group, and these will be our values of 𝑥 sub 𝑖. We then proceed in the exact same
manner as in this question.
We will now finish this video by
summarizing the key points. We saw in this video that the
standard deviation of a data set is used to measure the dispersion of data from the
mean. For data presented in a list, the
formula for the standard deviation 𝜎 𝑥 is as shown, where the set of data 𝑥 has
values 𝑥 sub one, 𝑥 sub two, and so on, up to 𝑥 sub 𝑛, with 𝑛 members and mean
𝜇. When the data is presented in a
frequency table and each element of the data set has corresponding frequency 𝑓 sub
one, 𝑓 sub two, and so on, up to 𝑓 sub 𝑛, then we can calculate the standard
deviation as shown. We also note that for grouped
frequency tables where data is given in intervals, the midpoint of the interval is
used to represent the values of 𝑥 sub 𝑖.