Video Transcript
Find Spearman’s correlation coefficient between 𝑥 and 𝑦. Round your answer to three decimal places.
Looking at this set of data, we see 𝑥-values are in the first row and 𝑦-values are in the second. These points then are bivariate, meaning they’re described by two variables. Where our question asks us to solve for the Spearman’s correlation coefficient between 𝑥 and 𝑦, we actually don’t directly use these data values to do that. Instead, we use what are called the ranks of these values. That’s because Spearman’s correlation coefficient describes the level of agreement between the relative ranks of bivariate data.
To see what this means, let’s create two new rows in our data table. The first new row can represent the rank of the 𝑥-values in our data, and the second new row will represent the rank of the 𝑦-values. Considering first the 𝑥-values in our table, we can rank these values from smallest to largest using the numbering one, two, three, and so on. This means that our smallest 𝑥-value will get a rank of one.
We see that smallest value is four. So that means 𝑅 sub 𝑥 for this value is one. The next lowest 𝑥-value is five, which means that this has a rank of two. Then comes seven, which must then have a rank of three. And next see that we have two 𝑥-values of eight. We can say that these are the fourth and fifth lowest 𝑥-values. But since they’re the same number, we take these two rankings, four and five, and find the average of them, that’s 4.5, and then assign that the relative ranking of each of these numbers. Lastly then, the highest 𝑥-value on our table is 12. So this has a ranking of six, the sixth smallest value. So that’s what it means to rank our data.
And now we’ll do the same thing for our 𝑦-values. The smallest 𝑦-value in the table is four, so that gets a rank of one. And then comes six, which we have three of. These values occupy then the second, third, and fourth places among our 𝑦-values. And since they’re all the same, we assign in the same ranking of the average of those three numbers, which is three. The next lowest 𝑦-value is seven. That is the fifth lowest 𝑦-value. So 𝑅 sub 𝑦 for this is five. And lastly, the highest 𝑦-value is 10, and so the ranking for this is six.
The results in these two rows of our table are what Spearman’s correlation coefficient is actually going to describe. This coefficient gives a quantitative indication of the level of agreement between the relative ranks of these data. Essentially, the closer 𝑅 𝑥 and 𝑅 𝑦 are for each point in the data set, the closer this correlation coefficient comes to positive one.
As our next step then, let’s create a row in our table that indicates the difference between respective 𝑅 𝑥 and 𝑅 𝑦 values. We’ll say that this value 𝑑 sub 𝑖 equals 𝑅 𝑥 minus 𝑅 𝑦 for each data point. For our first data point then, we have one minus five. That’s negative four. Then we have three minus three, zero, next 4.5 minus three or 1.5; two minus one or one; 4.5 minus three again, which is 1.5; and finally six minus six, which is zero.
In order to normalize these results, let’s make a final row in our table where we square these difference values. That way, none of these relative differences will be negative. Negative four times negative four is positive 16. Zero squared is zero. 1.5 squared is 2.25. One squared is one. 1.5 squared again is 2.25. And zero squared is zero.
At this point, let’s recall the mathematical relationship for Spearman’s correlation coefficient. We often represent this coefficient using an 𝑟 sub 𝑠. It’s equal to one minus six times the sum of all these 𝑑 sub 𝑖 values squared divided by the number of data points in our set 𝑛 multiplied by that number squared minus one. To calculate Spearman’s correlation coefficient for a set of data then, the two things we need to know are the sum of all the 𝑑 sub 𝑖 squared values and also the total number of points in the set.
Considering the sum of 𝑑 sub 𝑖 squared, we can solve for that by adding together all the results in the last row of our table. 16 plus zero plus 2.25 plus one plus 2.25 plus zero adds up to 21.5. And then, regarding the number of data points in our set, we see that we have one, two, three, four, five, six such points. This means that, in our case, 𝑛 equals six.
And now that we’ve gotten these two bits of information from the data in our set, we can clear away all the rows we created and move ahead to calculate 𝑟 sub 𝑠, Spearman’s correlation coefficient. We sub in our values for the sum of 𝑑 sub 𝑖 squared and 𝑛. And note that because 𝑛 equals six, one factor of six cancels from numerator and denominator. Moreover, six squared is 36. So we can express 𝑟 sub 𝑠 as one minus 21.5 over 36 minus one, or 35. Calculating this out, we get 0.38571 and so on.
But note that we want to give our final answer rounded to three decimal places. Doing this, we get a result of 0.386. To three decimal places, this is the Spearman’s correlation coefficient between 𝑥 and 𝑦.