Given the table, find Spearman’s rank correlation coefficient between the two variables 𝑥 and 𝑦. Determine the type of correlation.
Looking at this table, we see that 𝑥 and 𝑦 are given labels. They’re labelled weak, pass, good, very good, or excellent. Spearman’s rank correlation coefficient is a way to describe and quantify the agreement between two variables in regards to their rank. We have a formula for finding this correlation. 𝑟 equals one minus six times the summation of 𝑑 sub 𝑖 squared over 𝑛 times 𝑛 squared minus one, where 𝑑 sub 𝑖 is the difference in rank of the 𝑖th pair of data 𝑥 sub 𝑖 𝑦 sub 𝑖, for example, 𝑥 one 𝑦 one. And 𝑛 represents the number of data pairs.
Before we can calculate the correlation coefficient, we’ll need to assign ranks to our 𝑥 and 𝑦 values. We could rank the data from weak to excellent where weak equals one and excellent equals five. We could also rate the data from the excellent to the weak where excellent is one and weak equals five. Either way, we’ll work as long as we’re consistent with our labelling for the 𝑥 values and the 𝑦 values. Let’s go from weak to excellent assigning values. Starting with our 𝑥 ranks, we have a weak, that’s rank one; one pass, that’s rank two; good, rank three; very good, rank four; and excellent, rank five.
Now, we’ll rank our 𝑦s. We have one weak. That would be, first, pass gets a rank two. Second, good gets rank three, very good rank four, and excellent is rank five fifth. Before we move on, I wanna address something that sometimes happens with Spearman’s rank. Imagine this fifth data point for our 𝑦 also said pass. Notice how originally we ranked them two and then three. If they were both pass values, we would rank them two and a half. They’re taking up ranks two and three. So we assign them a value of the average of rank two and three. Two plus three is five divide that by two. And both of these values would get a rank of 2.5.
Getting back to our problem in hand, we’ll need to find the difference in these ranks. We can subtract the 𝑥 from the 𝑦, here five minus four. Or we can subtract the 𝑦 minus the 𝑥 which would be four minus five. This works as long as I do the same thing for all five columns. If I subtract 𝑦 from 𝑥 in the first column, I must subtract 𝑦 from 𝑥 in all the subsequent columns. In this case, I’ll say five minus four which is subtracting the 𝑦 from the 𝑥. Five minus four is one. Four minus five is negative one. One minus one equals zero. Two minus two equals zero. Three minus three equals zero. The sum of this row should be equal to zero.
We have one plus negative one. It does equal zero. Looking back at our formula, we see that we need the sum of these differences squared. At this point, we’ll square what we found in the previous row, one squared, negative one squared. And zero squared will equal zero for the last three values. One squared equals one. And negative one squared also equals one. The sum of 𝑑 sub 𝑖 squared is one plus one, so two. We can now plug this information into our formula, one minus six times the summation of 𝑑 sub 𝑖 squared over 𝑛 times 𝑛 squared minus one.
Our numerator should be six times two. If 𝑛 is the number of data pairs, we have five data pairs. Our 𝑛 equals five. And so the denominator will be five times five squared minus one. Six times two equals 12. And in our denominator, we have five times 25 minus one. 25 minus one equals 24. 12 over 24 simplifies to one over two. We can multiply five times two. Our new statement says one minus one-tenth. To simplify, we’ll rewrite one as a fraction out of 10. And we’ll have 10 out of 10 minus one-tenth equals nine-tenths.
Our Spearman’s rank correlation coefficient is nine-tenths. But we still need to determine the type of correlation this represents. Spearman’s rank correlation coefficient always falls between negative one and positive one. If your correlation coefficient is exactly one, it means at every point the rank of your 𝑥 values and the rank of your 𝑦 values are the same. That means the smallest value of 𝑥 is paired with the smallest value of 𝑦. We see this kind of relationship in three out of the five pairs of our data. And this makes sense because our correlation coefficient is nine-tenths. It’s almost one. We can call this a strong positive rank correlation.