Using the information given in the table, find the Spearman’s rank correlation between the variables 𝑥 and 𝑦. Give your answer to four decimal places.
We’re given a set of bivariate data, that is, paired data, for variables 𝑥 and 𝑦. The variables are qualitative, or categorical. That is, the values they take are nonnumerical. And we see that the possible values of both 𝑥 and 𝑦 are “Excellent,” “Very Good,” “Good,” and “Poor.” We can say then that our data follow a grading system with values in the definite categorical order ranging from “Excellent” to “Poor.” And since there is an order to our data, we can assign ranks to the variable values in our data set. We can then use these ranks to calculate Spearman’s correlation coefficient.
To do this, we first add two new rows to our table for the ranks. That’s 𝑅 𝑥 and 𝑅 𝑦. To first rank our 𝑥-data, to see how the ranking works, we first list our six values in order. And since we have four instances of “Excellent,” these take up the first four places in our ranking. And our two instances of “Good” will take up places five and six. And when we have repeated values like this, we need to calculate their tied ranks so that the repeated values all have the same rank. And we do this by calculating the average of their places or positions in the ordered list.
For the four instances of “Excellent” then in our 𝑥-data, each instance has a rank of one plus two plus three plus four divided by four. That is the sum of the positions taken by the four instances divided by the number of elements of equal value. And this evaluates to 10 over four, which is 2.5. This means that each of our instances of “Excellent” for the variable 𝑥 has a rank of 2.5, which we can put in our table in the row for the ranks of the 𝑥-values. And we know that assigning tied ranks in this way ensures that the sum of the ranks is the same for each of the two variables 𝑥 and 𝑦. And we’ll see that this is indeed the case when we finish ranking our data.
So now for the 𝑥-variable, we’re left with two values, which are both “Good.” These are in positions five and six in our list. And since they both have the same value, that is, “Good,” we’ll need to work out their tied ranks. This is given by the average of their positions five and six, which is five plus six over two. That is 11 over two, which is 5.5. These are both then assigned the rank of 5.5, which we put in our table underneath the instances of “Good” for the 𝑥-variable.
Now let’s assign ranks in the same way to our 𝑦-data. And listing our 𝑦-data in order, we see again that we have some repeated values. Labeling our positions again one to six, it’s very important that the positioning is the same way round as it was for the 𝑥-values; that is, we have a low number attached to a high grade, so we start with one for “Excellent.” Since we have only one instance of “Excellent,“ this element is ranked first, or one. And we can put this in our table under “Excellent” for the 𝑦-data. Similarly, we have only one instance of “Very Good” in the 𝑦-data, so we can rank this second. In our table then, the rank of two goes underneath the instance of “Very Good” for the 𝑦-data.
We have two instances of “Good.” So working out the tied ranks for these two, their positions are third and fourth. So their tied ranks are three plus four over two, that is, seven over two, which is 3.5. And this is their rank, which we put in our table underneath the two instances of “Good” within the 𝑦-data. And now we’re left with two instances of “Poor.” Their tied ranks are the average of their positions, that is, five plus six over two, which is 11 over two, which is 5.5. So these are both ranked 5.5. And we can put these in our table in the rankings for 𝑦. So now if we work out the sums of the ranks for each variable, we find, as expected, these are equal with a value of 21.
Now, to calculate Spearman’s rank correlation between the two variables, we use the formula the coefficient 𝑟 is equal to one minus six times the sum of the differences squared over 𝑛 times 𝑛 squared minus one, where 𝑛 is the number of data pairs, which in our case is six, and 𝑑 subscript 𝑖 is the difference in ranks of each data pair for 𝑖 is one to 𝑛. So we’re going to need to work out the differences in the ranks and the squares of those differences. So we add two more rows to our table.
So we begin by working out our difference. If we assign the number 𝑛 is equal to one, two, three, four, five, and six to our data pairs, we have that our first difference 𝑑 one is 5.5 minus 5.5, and that’s equal to zero. Our second difference 𝑑 two is 2.5 minus 3.5, which is negative one. 𝑑 three is 5.5 minus 5.5, which is zero. 𝑑 four is 2.5 minus one, which is 1.5. And similarly, 𝑑 five is 0.5, and 𝑑 six is negative one.
A good check that we’re on the right track at this point is that the sum of the differences is equal to zero. And in fact, this is the case for our differences. So now we work out the differences squared since this is what we need for our formula. We have zero squared, which is equal to zero; negative one squared, which is equal to one; again zero squared, which is zero; 1.5 squared, which is 2.25; 0.5 squared, which is 0.25; and again negative one squared, which is one. And so if we now sum the square differences, we have a sum of 4.5.
So now we can use this and the fact that 𝑛 is equal to six in our formula so that our Spearman’s rank correlation 𝑟 𝑠 is one minus six times 4.5 over six times six squared minus one. Our fraction evaluates to 27 over 210 so that the Spearman’s correlation is approximately equal to one minus 0.128571. That is to six decimal places, which to four decimal places is 0.8714. Hence, the Spearman’s rank correlation coefficient between the variables 𝑥 and 𝑦 is 0.8714 to four decimal places.
We can note at this point that Spearman’s rank correlation coefficient can take values from negative one to positive one and that our coefficient is within this range. In fact, since our coefficient is close to positive one, we can say that the rankings for 𝑥 and 𝑦 are in strong agreement. And so we would associate better grades or ratings for 𝑥 with better grades or ratings for 𝑦 and vice versa. And this is how we interpret the Spearman’s correlation coefficient with a value of 0.8714.