# Video: MATH-STATS-2018-S1-Q06

Using the given table, find the Spearman’s rank correlation coefficient between the two variables 𝑥 and 𝑦, and then determine the type of correlation.

06:28

### Video Transcript

Using the given table, find the Spearman’s rank correlation coefficient between the two variables 𝑥 and 𝑦, and then determine the type of correlation.

The Spearman’s rank correlation coefficient is a way of quantifying the correlation or agreement between the ranks of two variables. The formula for calculating the Spearman’s rank correlation coefficient is one minus six multiplied by the sum of 𝑑𝑖 squared over 𝑛 multiplied by 𝑛 squared minus one. Now, 𝑑𝑖 means the difference in the ranks of the 𝑖th pair of data, which just means the data pair 𝑥𝑖, 𝑦𝑖. So, for example, the first pair of data would be 𝑥 one, 𝑦 one. The second pair would be 𝑥 two, 𝑦 two, and so on. 𝑛 represents the number of pairs of data. So, in this question, that would be six.

We do need to be a little bit careful with this formula. A common mistake is to think that the one is also included in the numerator of the fraction; it isn’t. The one is entirely separate from the fraction. Now, before we can apply this formula, we must first rank the data because the data that we’ve been given is the raw values of 𝑥 and 𝑦 and not their ranks. So we begin by extending our table to include two rows for the 𝑥-rank and the 𝑦-rank. It doesn’t matter whether we give the rank one to the smallest or the largest piece of data, as long as we’re consistent between what we do for 𝑥 and what we do for 𝑦.

So I’ve chosen to give the rank one to the smallest piece of data, which for 𝑥 is 10. The second smallest is 20, so this gets the rank two. Then, 30, 40, 50, and, finally, 60. So now, we’ve given all of the 𝑥-data their ranks. For 𝑦, the smallest piece of data is 50, then 60, and then 70. However, the next two pieces of data are actually equal. As we see that both the first and last values are both equal to 80. So we need to consider how we’re going to award the ranks.

We can’t just award them both the rank of four and then the final value of 90, a rank of five. Because we need the sum of the ranks to be the same for both 𝑥 and 𝑦. We can’t even award the two equal pieces of data the same rank of four and then the final piece of data a rank of six. Because the total sum of the ranks would still not be the same as it is for 𝑥. Instead, we find the average of the ranks that we should be awarding to these pieces of data. So, in an ordered list, they’d be in the fourth and fifth places. So we average four and five, which gives 4.5, and award both pieces of data the same rank of 4.5. We then award the final biggest piece of data the rank of six. And this ensures that the sum of the ranks for 𝑥 and 𝑦 is equal.

Next, we need to find the difference between the ranks awarded to each pair of data. For the first pair of data, six minus 4.5 is 1.5. For the second, five minus six is negative one. The next three pairs of data all have the same rank for both 𝑥 and 𝑦. So the difference is zero. For the final pair of data, the difference is negative 0.5. In our formula, we need the squares of these differences. So we need to square the values that we’ve just found. For this reason, it doesn’t actually matter whether we subtract the rank of 𝑦 from the rank of 𝑥 or the other way round. Because when we square it, we’ll get the same result either way, a value of one or negative one will square to the same thing. Squaring the differences in the ranks gives the values 2.25, one, zero, zero, zero, 0.25.

Next, we need to find the sum of these squared differences. And the sum of 2.25, one, and 0.25, because all the other values are zero, is 3.5. Now, we are able to apply our formula. Remember, 𝑛 means the number of pairs of data. So in this case, 𝑛 is six. We have that 𝑟, the Spearman’s rank correlation coefficient, is equal to one minus six multiplied by 3.5 over six multiplied by six squared minus one. The sixes in the numerator and denominator of the fraction cancel. And as six squared is 36, this means that six squared minus one is 35. So we have one minus 3.5 over 35. 3.5 over 35 is just 0.1 because the numerator of this fraction is 10 times smaller than the denominator. So we have one minus 0.1 which is equal to 0.9.

The second part of the question asks us to determine the type of correlation that exists between 𝑥 and 𝑦. To do so, we need to recall how to interpret the Spearman’s rank correlation coefficient. And we remember, first of all, that this value is between negative one and one, inclusive. A value of one means that there’s perfect rank agreement between the two sets of data. Which means that for every pair of values 𝑥 and 𝑦, they’re given the same rank as each other. A value of negative one means that the ranks are completely opposite within each pair. So the largest 𝑥-value is the smallest 𝑦-value, and so on. Our value of 0.9 is positive, and it’s quite close to one. This means that there’s positive rank correlation between 𝑥 and 𝑦. We could also describe this as strong positive rank correlation to illustrate the strength of this relationship.

Now, just a brief word about how we dealt with the fact that we had two pieces of data with the same value for 𝑦. As there was only one pair of tied ranks in this case, the method that we used of allocating the average rank and then applying the standard Spearman’s rank correlation coefficient formula was perfectly valid. However, there are other methods, such as Kendall’s tau, which could be applied if we had a larger number of tied ranks. As we’ve already said that as there was such a small number of tied ranks in this question, the method we’ve used is perfectly fine. And, indeed, it’s what’s expected at this level.