In this video, we’ll learn how to find the Spearman’s rank correlation coefficient. You’ll already be familiar with the concept of correlation. You’ll know that Pearson’s product moment correlation coefficient can give an indication of the existence, strength, and direction of a linear relationship between two quantitative, that is, numerical, variables. But Pearson’s coefficient can only be calculated for quantitative data. If our data is nonnumerical, that is, descriptive or qualitative, and has some order or ranking, we can’t use Pearson’s correlation coefficient, but we can use Spearman’s rank correlation coefficient.
In this video, we’ll see how to calculate Spearman’s correlation coefficient using the formula and to determine if and what type of association we have between paired data sets. That’s bivariate data. We can also calculate Spearman’s correlation coefficient for numerical data. And it’s useful in particular if we have, for example, some outliers in our data set.
Remember that when we plot paired numerical data on a scatter plot, we’re looking for a relationship between the two variables. If we have a linear relationship, we can use Pearson’s product moment correlation coefficient to determine the strength and direction of the relationship. We know that if the correlation coefficient is close to positive one, we have a strong direct or positive correlation between the variables. And if 𝑟 is close to negative one, we have a strong inverse or negative correlation. If 𝑟 is zero, we have no correlation. And if our relationship is nonlinear, then we can’t use Pearson’s correlation coefficient. And we recall that Pearson’s correlation coefficient takes values from negative one to plus one.
Now, with ranked or ordered data, again, the correlation coefficient lies between plus and negative one. But now the interpretation is slightly different. If Spearman’s rank correlation coefficient is close to or exactly one, we have perfect agreement or association between the ranks. If 𝑟 s is zero, then there is no agreement or association between the ranks of our bivariate data. And if 𝑟 s is negative one, we have perfect opposing or inverse association between the ranks of our bivariate data.
Note also that sometimes Spearman’s rank correlation coefficient is referred to as Spearman’s 𝜌. That’s the Greek letter 𝜌. To calculate Spearman’s coefficient, if our data is not already ranked, this is our first step. We then find the differences between the ranks, that’s 𝑑 𝑖, for each data pair. We then square each of these and take the sum of the squares. And if 𝑛 is the number of bivariate data points, 𝑖 takes values from one to 𝑛, and so we have 𝑛 differences squared.
Now, considering this formula and what we know about possible values of Spearman’s rank coefficient, that is, negative one is less than or equal to 𝑟 s, which is less than or equal to one, is it true or false that when the ranks of each of two corresponding elements in two groups of data 𝑋 and 𝑌 are identical, Spearman’s rank correlation coefficient is equal to one? Well, we know that Spearman’s rank correlation coefficient is used to determine the relationship between the order or ranking of bivariate data and that if the correlation coefficient is equal to one, we have perfect correlation or perfect agreement. So let’s look at this from the perspective of an example.
Suppose we have two judges, 𝑋 and 𝑌, ranking five cakes from best, which is one, to worst, which is five. And suppose the ranking of the judges agrees exactly. If we didn’t work out the differences in ranks, then because the judges agree, the differences are all equal to zero. And then, of course, all the differences squared are equal to zero.
And remember that Spearman’s rank correlation coefficient is one minus six times the sum of all the differences squared over 𝑛 times 𝑛 squared minus one. And in our example, all the differences squared are equal to zero. So the sum of the differences squared is also zero. So that in our formula, in the numerator, we have zero. We have five cakes, so 𝑛 is equal to five. And our correlation coefficient is one minus six times zero over five times five squared minus one. And since our second term is equal to zero, since anything multiplied by zero is zero, our correlation coefficient is equal to one.
So certainly, in our example, where the ranks of the two groups are identical, Spearman’s rank correlation coefficient is indeed equal to one. But if we think in more general terms, the difference for each data pair is its rank in 𝑌 subtracted from its rank in 𝑋. And if the two ranks agree, then their difference is zero. And if this is true for all 𝑖, then 𝑑 𝑖 squared is equal to zero so that the sum of the 𝑑 𝑖 squared is also zero. And if the sum of the 𝑑 𝑖 squared, the differences squared is equal to zero, then our second term will always be zero.
And if our second term is zero, then Spearman’s rank correlation coefficient must be equal to one. The statement then that if the ranks of two corresponding elements in two groups of data 𝑋 and 𝑌 are identical, Spearman’s rank correlation coefficient is equal to one is true. In this example, we used data that was already ranked. But more often than not, we start off with a bivariate data set, and we have to rank the data ourselves.
Is it true or false that when Spearman’s rank correlation coefficient for two groups of data equals one, it means that the data points perfectly lie on a straight line?
We know that when Spearman’s rank correlation coefficient is equal to one, we have perfect agreement between the ranks of the data. And if Spearman’s rank correlation coefficient is equal to one, then the term containing the sum of the differences squared must be equal to zero. So let’s look at this final example. Suppose we have the time in minutes it took for five students to take a test and their marks as a percentage. And now suppose we rank both our time and our marks, taking the shortest time and the lowest marks as one and the highest as five. And now, if we calculate the difference in ranks, each of the differences are zero because the ranks are in perfect agreement.
Now, if we square all the differences, each of these is equal to zero because zero squared is zero. And so the sum of the differences squared is also zero. And if we put this into our formula, the sum of the 𝑑 𝑖 squared is equal to zero, so the second term is equal to zero as we would expect. But now suppose we plot our original data. We can see from our scatter plot that although Spearman’s rank is equal to one, the data points themselves do not lie perfectly on a straight line. And this means that our statement is false. In general, the fact that the ranks of the data are equal means that if we plotted the ranks, they would lie on a perfect straight line. But this is not necessarily the case for the original data.
Let’s look now at an example of how to calculate Spearman’s rank correlation coefficient for some quantitative bivariate data.
Find the Spearman’s rank correlation coefficient between the product price and its lifetime from the given data. Round your answer to four decimal places.
We’re given a table with lifetime in years and price in dollars. And we’re asked to find Spearman’s rank correlation coefficient between the paired data. We use the term paired because each pair of data refers uniquely to one product so that the product with a lifetime of one year has a price of 79 dollars, for example. Now to use the given formula to calculate Spearman’s rank correlation coefficient, we need to know the number of pairs of data 𝑛. And we need to know the difference in ranks for each pair of data, and we then work out the sum of the differences squared.
Now, since the lifetime data is actually ordered sequentially already, that is, it goes from one to six with no omissions, the lifetime data is already ranked. So we can simply use the data itself as the rank. However, for the sake of clarity, let’s write this down again in a new row. And next we need to rank our price data. Noticing that a low price corresponds to a low rank in lifetime, we can begin our price ranking at one also so that we rank the price 79 as one. Our next lowest price is 103 dollars, which can be ranked as second. Our third lowest is 105, which is ranked third, and so on so that 125 is ranked fourth, 160 dollars is ranked fifth, and 214 dollars is ranked sixth.
Our next step is to find the difference in ranks for each pair of data. We subtract the price rank from the lifetime rank so that, in the first column, we have one minus one is equal to zero. And for a lifetime of five years and a price of 160 dollars, we have five minus five is equal to zero. Next, four minus four is equal to zero, two minus three is negative one, six minus six is zero, and three minus two is equal to positive one. Our next calculation is the difference in ranks squared so that we have zero squared is zero and so on for the rest of our differences. And now to use our Spearman’s rank correlation coefficient, we need the sum of the differences squared, that is, zero plus zero plus zero plus one plus zero plus one, which is equal to two.
It’s worth noting at this point that if we were to sum the differences in ranks, we get zero, and this should always be the case. In our case, we have zero plus zero plus zero plus negative one plus zero plus positive one, and that’s equal to zero. In order to use the formula, we also need to know the number of data pairs, and we have six data pairs so 𝑛 is equal to six.
So now making some room, we have everything we need for our formula so that Spearman’s rank correlation coefficient for this data is one minus six times two all over six times six squared minus one. That is one minus 12 over 6 times 35, where six times 35 is 210, which is approximately equal to one minus 0.05714. This gives us Spearman’s rank correlation coefficient approximately equal to 0.94286. And so to four decimal places, Spearman’s rank correlation coefficient for this data is 0.9429. Since this value is very close to positive one, we can interpret this as a very strong direct relationship or association between a product lifetime in years and its price in dollars. That is, the higher the price, the longer the product lasts.
It’s perhaps worth noting that had our coefficient been negative at negative 0.9429, our interpretation would be the exact opposite. In that case, we would interpret the value as the higher the price, the shorter the lifetime. The relationship would still be extremely strong since now negative 0.9429 is very close to negative one. But in this case, it would be an inverse association. Often when we have bivariate data that we wish to find Spearman’s rank correlation coefficient for, we find that we have tied ranks.
This occurs when ranking data. If two or more data points are identical, their rank is then the average of the place numbers they take up in the ordered list. Suppose, for example, we have a data set for the variable 𝑋 with values 20, 30, 20, 10, and five. If we wish to rank our data from low to high, we note that five is the lowest value, so this comes with rank one. 10 is the next lowest value, so this has rank two.
But now we have two values of 20 so that the value of 20 takes up both third and fourth places in our ordered list. So we take the average of the place numbers that these two 20s take up. That’s three plus four divided by two and that’s equal to 3.5 so that both instances of 20 are ranked 3.5. And since third and fourth places are now taken up, we rank our final piece of data fifth.
So let’s see how this works in an example.
The table represents the power output and rotor diameter of several helicopters. Find the Spearman’s rank correlation coefficient, and round your answer to four decimal places.
We’re given a set of paired data for the power output and rotor diameter of some helicopters. We use the term paired data because each pair of data is unique to one helicopter. So, for example, the helicopter with a power output of 1,218 kilowatts has a rotor diameter of 10.2 meters. And to calculate Spearman’s rank correlation coefficient, we’ll use the formula given. In this formula, 𝑛 corresponds to the number of data pairs. 𝑑 𝑖 corresponds to the difference in ranks for each pair, where 𝑖 takes value from one to 𝑛, and we calculate the sum of the differences squared.
The first thing we need to do then is to rank each of our two data sets. And to do this, let’s make some room. If we begin by ranking the power output, we could start at either the lowest or the highest power output. It should make no difference to the Spearman’s correlation coefficient, provided we stick to the same direction for the rotor diameter rankings. So let’s start with the last power output, which is 944, which we rank as one. And to avoid confusion later on, let’s strike this out. Our next lowest power output is 1,218, so we can strike this out and rank this two. And the next lowest is 1,864, which we can rank third. 3,324 can be ranked fourth, 3,552 is ranked fifth, 3,758 is ranked sixth, and our highest power output is 4,698, which is ranked seventh.
And now for our rotor diameters, our lowest value is 10.2 meters. But this occurs twice, so effectively we have tied ranks for the first place. How this works statistically, however, is we take the average of the places that these data points would occupy. That is first and second places so that the ranks of the two data points with values 10.2 or one plus two over two. That is the first place and the second place over two, which is 1.5, so that both of our instances of a rotor diameter of 10.2 meters are ranked 1.5. And we can strike these two out.
Now, our third lowest value is 14, so we can strike this out. And since first and second places have already been taken by the 10.2 values, we must rank this third. Our next lowest value is 16.2, which we rank fourth, followed by 16.3, which is ranked fifth, followed by 17.7, which is ranked sixth, and finally 18.59, which is ranked seventh.
Now, to use our formula, we need the differences in ranks squared for each data pair. So let’s first take the differences in ranks. To do this, we subtract the diameter rank from the power rank for each pair so that, in our first data column, we have two minus 1.5, which is 0.5, for our next column, three minus three, which is zero, one minus 1.5, which is negative 0.5. We have seven minus seven is zero, five minus four is one, four minus five is negative one, and six minus six is zero.
Our next step is to work out the differences squared. In our first column, 0.5 squared is 0.25. In our second data column, zero squared is zero. In our third column, negative 0.5 squared is 0.25. In our fourth column, zero squared is zero. In our fifth column, one squared is one. In our sixth column, negative one squared is one. And in our final column, zero squared is zero.
Now for a formula, we want the sum of the differences squared. That is 0.25 plus zero plus 0.25 plus zero plus one plus one plus zero, which is 2.5. Now, before we use the formula, let’s just check that the sum of the differences is equal to zero as it should be. We have 0.5 plus zero plus negative 0.5 plus zero plus one plus negative one plus zero, and that is indeed equal to zero.
Now we have seven pairs of data so that our 𝑛 is equal to seven. And so Spearman’s rank correlation coefficient is one minus six times 2.5 over seven times seven squared minus one. That is one minus 15 over 336. If you do this on your calculator, it’s very important at this point to separate the one from the fraction. And to do this, we calculate 15 divided by 336; that’s 0.04464. And so Spearman’s rank correlation coefficient for this data is 0.9554 to four decimal places.
We complete this video by noting some key points. Spearman’s rank correlation coefficient applies to ordered bivariate data. It takes values from negative one to positive one. 𝑟 is close to positive or negative one, corresponds to strong direct or inverse agreement, and the sum of the differences in ranks is always equal to zero.