A set of data involving two variables, 𝑥 and 𝑦, has been collected. The 𝑥-values are ranked according to their values. The smallest value is rank one, the next smallest value is rank two, and so on. The 𝑦-values are ranked in the same way. 𝐷 is the difference between the ranks of 𝑥 and 𝑦 within each pair of variables 𝑥, 𝑦. Given that the sum of 𝐷 squared equals zero, find the correlation coefficient 𝑟 between 𝑥 and 𝑦.
As the data that we’re looking to calculate the correlation between has been ranked, this means that the correlation coefficient we need to use is the Spearman’s rank correlation coefficient. The formula for calculating this is one minus six multiplied by the sum of 𝐷 squared over 𝑛 multiplied by 𝑛 squared minus one, where 𝑛 represents the number of pairs of data that we have. Do be careful with this. A common mistake is to think that the whole of the expression is over that denominator of 𝑛 multiplied by 𝑛 squared minus one; it isn’t. The one is not part of the fraction.
Now, you may be wondering how we’re supposed to calculate this, as we haven’t been given the value of 𝑛 in the question. But let’s look at what we have been given. We’ve been told that the sum of 𝐷 squared is equal to zero. This means that we have a zero in the numerator of the fraction. Six multiplied by zero is still zero. And dividing zero by a number is also still zero. So, in fact, we didn’t need to know the value of 𝑛 at all. Our calculation of the correlation coefficient is one minus zero, which is equal to one.
Now, let’s just briefly think about why this is the case? If the sum of 𝐷 squared is equal to zero, then this means that each of the individual 𝐷-squared values must also be equal to zero. As 𝐷 squared, a square value, is nonnegative. So to sum to zero, all the individual values must be zero. Taking the square root of each side of this equation means that 𝐷 must also be equal to zero. 𝐷 is the difference between the ranks of 𝑥 and 𝑦 within each pair of variables. So if each individual difference is equal to zero, then this means that each pair of 𝑥, 𝑦 values has the same rank as each other.
This means that there is perfect positive rank correlation between 𝑥 and 𝑦. And this is reflected by the Spearman’s rank correlation coefficient value of one, which is the greatest value that the Spearman’s rank correlation coefficient can take.