The following table shows the number of units of a certain 𝑥 and the production cost per unit 𝑦 as produced in seven different factories in Egyptian pounds. Calculate the value of Spearman’s rank correlation coefficient between 𝑥 and 𝑦. Determine the type of correlation.
Spearman’s rank correlation coefficient is a way of quantifying the degree of correlation between the ranks of two variables. It measures the tendency for one variable to increase as the other does, but not necessarily in a linear way. The formula for calculating Spearman’s rank correlation coefficient is this, one minus six multiplied by the sum of 𝑑𝑖 squared over 𝑛 multiplied by 𝑛 squared minus one. Here, 𝑛 represents the number of pairs of data.
So in this question, 𝑛 will be seven, as there are seven pairs of data given in the table. 𝑑𝑖 means the difference in the ranks of the 𝑖th pair of data, that is, the pair of data 𝑥𝑖, 𝑦𝑖. For example, the first pair of data is 𝑥 one 𝑦 one. And the second is 𝑥 two 𝑦 two and so on. Before we can apply the Spearman’s rank correlation coefficient formula, we must first rank the data ourselves. It doesn’t matter whether we choose the rank one to be awarded to the smallest or the largest data value as long as we’re consistent about what we do for the two variables.
We can add two more rows to our table to fill in the 𝑥 rank and the 𝑦 rank. And let’s choose to assign rank one to the smallest piece of data in each case. For 𝑥 then, the smallest piece of data is 600, so this gets rank one; then 700, which gets rank two; then 1400, which gets rank three. There are then two equal pieces of data. The second and seventh values in the 𝑥 row are both 1500. Now, we have to decide how to assign the ranks in this instance. In an ordered list of the data values, these would take up the fourth and fifth places in the list. Therefore, we choose to assign both pieces of data the same rank of 4.5. That’s the average of four and five. We then continue.
The next piece of data is 2000 which would be the sixth value in our ordered list. So it gets rank six. And finally, 2500 is the greatest value so it gets the rank seven. We then assign the ranks for the 𝑦 variable in the same way. But we notice straight away that there are two pieces of data which are both equal to 20, the smallest value. These would be the first and second values in an ordered list of the 𝑦 data. So we assign them both the rank of 1.5. That’s the average of one and two.
The next smallest piece of data is 23 which gets the rank three because it would be the third value in an ordered list. And then we notice that there are two values both equal to 24 which would be the fourth and fifth values in an ordered list of the 𝑦 variable. So they both get rank 4.5. That’s the average of four and five. 25 then gets rank six. And finally, 30 gets rank seven. This method of dealing with the tied ranks by awarding an average rank to both pieces of data is appropriate in this case as there are only a small number of tied ranks. There are other methods that we could consider such as Kendall’s tab. But the method we’ve used is fine in this instance.
Next, we need to work out the difference in the ranks awarded to each pair of data. It doesn’t matter whether we subtract the rank awarded to 𝑥 from the rank awarded 𝑦 or vice versa as long as we’re consistent about what we do for every pair. Let’s choose to subtract the ranks of 𝑦 from the ranks of 𝑥. First, we have one minus seven which is equal to negative six, then 4.5 minus 4.5 which gives zero. We now work out all the other differences in the same way, giving negative 1.5, negative four, 4.5, 5.5, and 1.5. At this point we can perform a quick check about work so far because it should always be the case that the sum of these differences is equal to zero. If we add up negative six, zero, negative 1.5, negative four, 4.5, 5.5, and 1.5, we do indeed get zero. So this helps us be confident that the work we’ve done so far is correct.
Finally, we need to work out the squares of these differences. And this is why it doesn’t actually matter which way around we subtract the ranks because we’re going to end up squaring the differences anyway. Negative six squared gives 36. Zero squared gives zero. We square all the remaining differences in the same way, giving 2.25, 16, 20.25, 30.25, and 2.25.
Now, we’re nearly ready to apply our Spearman’s rank correlation coefficient formula. But first, we need to work out the sum of the squared differences. That’s the sum of the seven values in the final row of our table. Adding all these values up gives 107. Now substituting the relevant values into our formula for Spearman’s rank correlation coefficient then, the sum of 𝑑𝑖 squared is 107. And the value of 𝑛 is seven. So we have one minus six multiplied by 107 over seven multiplied by seven squared minus one. Seven squared is 49. And subtracting one gives 48. In the numerator, six multiplied by 107 is 642. And in the denominator, seven multiplied by 48 is 336. So we have one minus 642 over 336.
Evaluating this on a calculator and converting our answer to a decimal gives negative 0.9107 and then the decimal continues. We haven’t been asked to give our answer to a particular degree of accuracy. So let’s use three significant figures. In this case, the fourth significant figure is the seven. And as this is greater than five, it tells us that we’re rounding up. So the zero in the third decimal place will round up to become a one. We’ve calculated the value of 𝑟 then to be negative 0.911 correct to three significant figures.
Now, the question also asked to determine the type of correlation which means we need to interpret what this value of 𝑟 tells us about 𝑥 and 𝑦. To do so, we recall that this correlation coefficient always takes a value between negative one and one inclusive. A value of positive one means that there is perfect positive rank agreement between 𝑥 and 𝑦 which means that the smallest value of 𝑥 is paired with the smallest value of 𝑦. The second smallest value of 𝑥 is paired with the second smallest value of 𝑦 and so on, all the way up to the largest value of 𝑥 being paired with the largest value of 𝑦. A value of negative one means there is perfect negative rank correlation between 𝑥 and 𝑦 which means the opposite. The smallest value of 𝑥 is paired with the largest value of 𝑦 and vice versa.
In this case, our value of negative 0.911 is pretty close to negative one which means that there is strong negative rank correlation between 𝑥 and 𝑦. This makes sense if we consider the context of this problem. The larger number of units you produce, the more efficient this will be. And so the production cost per unit will be lower. We have our answer to the problem then. The value of the Spearman’s rank correlation coefficient to three significant figures is negative 0.911. And we conclude that there is a strong negative rank correlation between 𝑥 and 𝑦.