Video Transcript
Using the information from the
table, find the Spearman’s rank correlation coefficient and determine the type of
correlation between the variables 𝑋 and 𝑌. Give the numerical part of your
answer to four decimal places.
The Spearman’s rank correlation
coefficient is a measure of the tendency for one variable in a bivariate dataset to
increase or decrease as the other does, although not necessarily in a linear
way. Calculation of this statistic
doesn’t use the raw data but instead the rank or position of each value within the
data set. We’ll begin by assigning a rank to
each value of the 𝑋-variable and a rank to each value of a 𝑌-variable. It doesn’t matter whether we choose
the smallest or the largest value to have the rank of one, as long as we’re
consistent for the two variables.
For the 𝑋-variable, the smallest
data value is six, so we assign this rank one. There are then two data values of
nine, so we need to understand how to deal with cases of equal data values. If we were to write out the ordered
list of 𝑋-values, these two values of nine would take the second and third places
in the list. As they are equal, we give both
data values the average of these two ranks. So, both values of nine get given
the rank of 2.5. Treating tied ranks in this way
ensures that the sum of the ranks will be the same for both variables. We then assign rank four to the
data value 10 and ranks five and six to the values 13 and 14, respectively.
Next, we perform the same process
for the 𝑌-variable. This time, the two smallest values
are equal, so we assign them both the same rank of 1.5, which is the average of one
and two. The next two values are also equal,
so we award each of these the average of three and four, which is 3.5. Finally, we award rank five to 21
and rank six to the largest value of 23. Now that we’ve assigned all of the
ranks, let’s introduce the formula for calculating the Spearman’s rank correlation
coefficient. It is one minus six multiplied by
the sum of 𝑑 𝑖 squared over 𝑛 multiplied by 𝑛 squared minus one. Here, 𝑑 𝑖 represents the
difference in the ranks for each pair of data, and 𝑛 represents the number of data
pairs, which in this question is six.
The next thing we need to calculate
then is the difference in ranks for each pair of data, so we’ll add another row to
the table to do this. It doesn’t actually matter which
way round we subtract the ranks, but let’s subtract the rank of 𝑌 from the rank of
𝑋 for consistency. First, we have six minus five,
which is one, then 2.5 minus 3.5, which is negative one. The remaining differences are
negative two, 3.5, negative 2.5, and one. At this point, there’s a useful
check we can perform, because the sum of the differences should always be equal to
zero. Summing the six values in the
bottom row of the table does indeed give zero, so this confirms that the work we’ve
done so far is correct.
Next, we need to find the square of
each difference, so we can add another row to the table to do so. We’re now ready to calculate the
Spearman’s rank correlation coefficient for this data set. Summing the squared differences in
the final row of the table gives 25.5. We then substitute this value for
the sum of 𝑑 𝑖 squared and six for 𝑛 into the Spearman’s rank correlation
coefficient formula to give 𝑟 sub 𝑠 equals one minus six multiplied by 25.5 over
six multiplied by six squared minus one. Evaluating gives one minus 51 over
70, which is 19 over 70. We’re asked to give the answer to
four decimal places, so evaluating this fraction as a decimal first gives 0.271428
continuing. And then rounding to four decimal
places gives 0.2714.
The final part of the question asks
us to determine the type of correlation that exists between the variables 𝑋 and
𝑌. This means we need to use the value
of the Spearman’s rank correlation coefficient we’ve just calculated to determine
whether the two variables have a tendency to increase together or for one to
decrease as the other increases. We recall that the value of the
Spearman’s rank correlation coefficient is always between negative and positive one
inclusive. A positive value indicates that the
two variables have a tendency to increase together, which we refer to as direct
correlation. A negative value indicates that as
one value increases, the other decreases, which we refer to as inverse
correlation.
As the Spearman’s rank correlation
coefficient we’ve calculated is positive, this means that there is a direct
correlation between 𝑋 and 𝑌. The value is quite close to zero
though, so whilst a direct correlation does exist, it is relatively weak. We’ve found then that the value of
the Spearman’s rank correlation coefficient for this dataset to four decimal places
is 0.2714. And hence, there is direct
correlation between 𝑋 and 𝑌.