In this explainer, we will learn how to find Spearman’s rank correlation coefficient.
We will find its value for sets of both quantitative and qualitative bivariate data. The data described by Spearman’s rank correlation coefficient can be either discrete or continuous if it is quantitative.
Definition: Bivariate Data
Bivariate data is data on each of two variables, with each value of one of the variables paired with a value of the other variable.
Definition: Quantitative and Qualitative Data
Quantitative data is numerical. An example of a set of quantitative bivariate data is . This might describe a person’s age in years and the person’s height in centimetres. A person’s age is discrete if it can be given only as a whole number of years, while the person’s height is continuous if it can be given as any fraction of a centimetre.
Qualitative data (also referred to as descriptive or categorical data) is not numerical. An example of a set of qualitative bivariate data is {(large, large), (medium, large), (small, small), (medium, medium), (large, medium)}. This might describe a person’s shirt size in two different brands. To calculate Spearman’s rank correlation coefficient for qualitative data, the data must be able to be ordered (e.g., small, medium, large).
Spearman’s rank correlation coefficient, denoted by , is a measure of the tendency for one variable to increase or decrease as the other does within a monotonic (entirely increasing or entirely decreasing) relationship, such that
If one variable always increases as the other does, we can say that the value of is positive and there is a direct association between the variables. On the other hand, if one variable always decreases as the other increases, then we can say that the value of is negative and this indicates an inverse association. Rank correlation coefficient values of 1 or describe a perfectly associated monotonic relationship. This means that either the ranks agree entirely or they are direct opposites . Unlike with Pearson’s correlation coefficient, a perfect value of or 1 can occur regardless of whether the quantitative data pairs in a set are linearly related or not.
Not only can be 1 or , but also it can have any value between and 1. A value of 0 for indicates no association between the variables. The closer the value of is to or 1, the stronger the association, and the closer it is to 0, the weaker the association.
Definition: Spearman’s Rank Correlation Coefficient
Spearman’s rank correlation coefficient, denoted by , is a numerical value such that . It gives a measure of the likelihood of one variable increasing as the other increases (a direct association) or of one variable decreasing as the other increases (an inverse association). Direct associations are indicated by positive values, and inverse associations are indicated by negative values. No association is indicated by a value of 0. The stronger the association, the closer is to or 1, and the weaker the association, the closer it is to 0. Rank correlation coefficient values of 1 or mean that either the ranks agree entirely or they are direct opposites .
Our first step in determining the value of for a set of bivariate data pairs is to rank the values of each variable. In a quantitative data set, the smallest rank for a variable can be assigned to either the least or the greatest data value, but each variable must be ranked in the same way. That is, both must be ranked either from least to greatest or from greatest to least. Also, if two data values are the same, then their ranks must also be the same. Thus, the ranks of two or more identical data values are equal to the average of their places in an ordered list. The identical data values are said to have tied ranks.
Suppose we have a data set consisting of the points
The two variables are referred to as , with sample values , and , with sample values , such that a general bivariate item is denoted . In this data set, , since there are 3 data pairs. The values of are 2, 5, and , while the values of are , 4, and 1. Putting the values of in order from least to greatest gives us
Doing the same for the values of , we get
This means that for, the values of , if the rank of is 1, then the rank of 2 is 2, and the rank of 5 is 3. The values of must be ranked in the same way, so the rank of is 1, the rank of 1 is 2, and the rank of 4 is 3. For each point , the difference in the coordinates’ ranks can be denoted as and the squares of the differences as . This is shown in the table below, where the ranks of the values of are represented by , and the ranks of the values of are represented by .
2 | 5 | ||
2 | 3 | 1 | |
4 | 1 | ||
1 | 3 | 2 | |
1 | 0 | ||
1 | 0 | 1 |
Once we have the values of , we can use them, along with the value of , or the number of data pairs, in a general formula for Spearman’s rank correlation coefficient. In our first example, we will learn to recognize what that formula is.
Example 1: Recognizing the Formula for Spearman’s Rank Correlation Coefficient
Which of the following is the formula for Spearman’s rank correlation coefficient?
Answer
The formula for Spearman’s rank correlation coefficient (sometimes simply referred to as rank correlation) is In it, represents the coefficient, and the number of points in the data set is represented by . The square of the difference in the ranks of the two coordinates for each point is represented by , and the expression indicates that we should find the sum of each of these squares. The formula was developed by Charles Spearman, an English psychologist known for his work in statistics. Calculating the rank correlation is equivalent to finding Pearson’s correlation on a new set of variables: the ranked values of the data.
Formula: Spearman’s Rank Correlation Coefficient
The formula for Spearman’s rank correlation coefficient is , where is the coefficient and is the number of points in the data set. For each point , the square of the difference in the ranks of the two coordinates is represented by , and the sum of each of these squares is represented by the expression .
Now that we have a general formula, we can use it to solve problems. Let’s begin by considering what the value of Spearman’s rank correlation coefficient will be when the corresponding elements in two groups of data have the same ranks.
Example 2: Determining When Spearman’s Rank Correlation Coefficient is Equal to 1
True or False: When the ranks of each two corresponding elements in two groups of data and are identical, Spearman’s rank correlation coefficient is equal to 1.
Answer
To help answer the question, let’s look at a real-world example. Suppose that, in the table below, and represent the ranks given to five dogs at a dog show by judges and , with 1 being the top-ranked dog and 5 being the bottom-ranked dog. The ranks of the two judges are identical, so we can see that the difference in the ranks for each dog, which is represented by , or , is equal to 0. Since , we can also see that is equal to 0 for each dog.
Dog | Rank of Judge | Rank of Judge | ||
---|---|---|---|---|
Dachshund | 2 | 2 | 0 | |
St. Bernard | 4 | 4 | 0 | |
Beagle | 1 | 1 | 0 | |
Irish Setter | 5 | 5 | 0 | |
Poodle | 3 | 3 | 0 |
Remember that the formula for Spearman’s rank correlation coefficient is , where represents the coefficient, is the number of data pairs, and is the square of the difference in the ranks of the two variables for each data pair.
Here, we know that the value of is 5, since there are 5 data pairs, and the value of is
Thus, the value of Spearman’s rank correlation coefficient is
Not only is the value of Spearman’s rank correlation coefficient equal to 1 in this example, but it will also be equal to 1 in any example we look at in which the ranks of the two variables are identical. This is because, in the formula, the value of , and subsequently the value of the fraction , will always be equal to 0, and . Therefore, we can say that it is true that when the ranks of each two corresponding elements in two groups of data and are identical, Spearman’s rank correlation coefficient is equal to 1.
Next, let’s look at some more problems in which we must find Spearman’s rank correlation coefficient for a set of qualitative or quantitative bivariate data. In these problems, the ranks will not be given. We will look at quantitative data first.
Example 3: Calculating Spearman’s Rank Correlation Coefficient for Quantitative Data
Find the Spearman’s rank correlation coefficient between the product price and its lifetime from the given data. Round your answer to four decimal places.
Lifetime (yr) | 1 | 5 | 4 | 2 | 6 | 3 |
---|---|---|---|---|---|---|
Price ($) | 79 | 160 | 125 | 105 | 214 | 103 |
Answer
Recall that the formula for Spearman’s correlation coefficient is , where represents the coefficient, is the number of data pairs, and is the square of the difference in the ranks of the two coordinates for each data pair.
We see that the lifetimes of the products and their prices make up a set of quantitative, bivariate data. First, let’s assign ranks to the products’ lifetimes. Putting the lifetimes in order from shortest to longest gives us
The shortest lifetime is 1 year, so we can assign it the lowest rank (1) or the highest rank (6). We should arrive at the same value for Spearman’s rank correlation coefficient either way, as long as we rank the products’ lifetimes and prices in a similar fashion. Here, we will use a rank of 1 for a lifetime of 1 year, so a lifetime of 2 years will get a rank of 2, a lifetime of 3 years will get a rank of 3, and so on, with a lifetime of 6 years getting a rank of 6.
Now let’s repeat this process with the prices. Putting them in order from lowest to highest gives us
Here, $79 gets a rank of 1, so $103 gets a rank of 2, $105 gets a rank of 3, and so on, with a price of $214 getting a rank of 6. The lifetimes and prices and their ranks are shown below, along with the differences in the ranks and the squares of the differences. The ranks of the lifetimes are represented by , while the ranks of the prices are represented by . Notice that the differences sum to
In fact, it will always be the case that the sum of the differences in ranks is equal to zero. So finding the sum of the differences is a good way to check our work.
Lifetime (yr) | 1 | 5 | 4 | 2 | 6 | 3 |
---|---|---|---|---|---|---|
1 | 5 | 4 | 2 | 6 | 3 | |
Price ($) | 79 | 160 | 125 | 105 | 214 | 103 |
1 | 5 | 4 | 3 | 6 | 2 | |
0 | 0 | 0 | 0 | 1 | ||
0 | 0 | 0 | 1 | 0 | 1 |
We must substitute values for and into the formula to find Spearman’s rank correlation coefficient. Here, we know that the value of is 6, since there are 6 data pairs, and the value of is
Thus, the value of Spearman’s rank correlation coefficient is
Correct to four decimal places, the value of the coefficient is 0.9429. This is quite close to 1, so we can say that the ranks are in strong agreement. Thus, we can conclude that longer lifetimes tend to be associated with higher prices and vice versa.
We will again determine Spearman’s rank correlation coefficient for a set of quantitative data in the example that follows. This time, we will calculate tied ranks. That is, the ranks associated with data points with the same value.
Example 4: Calculating Spearman’s Rank Correlation Coefficient for Quantitative Data
Find the Spearman’s correlation coefficient between and . Round your answer to three decimal places.
4 | 7 | 8 | 5 | 8 | 12 | |
7 | 6 | 6 | 4 | 6 | 10 |
Answer
To determine Spearman’s correlation coefficient, we will use the formula , where represents the coefficient, is the number of data pairs, and is the square of the difference in the ranks of the two coordinates for each data pair.
We see that the table shows a set of quantitative bivariate data. First, let’s assign ranks to the -values. Putting them in order from least to greatest gives us
The smallest data value is 4, so we can choose to assign it either the lowest rank (1) or the highest rank (6). As long as we are consistent in the method we choose to assign ranks for both the - and the -values, we will arrive at the same value for Spearman’s rank correlation. Here, we will assign the number 4 the lowest rank: 1. This means 5 gets a rank of 2 and 7 a rank of 3.
Since 8 appears in both the fourth and fifth positions in our ordered list, we will assign each instance of 8 a rank equivalent to the average of their positions, or a rank of
We also know that a rank of 6 should be used for 12, since 12 is in the sixth position in the list.
Now, let’s repeat this process with the -values. Putting them in order from least to greatest gives us
Here, 4 gets a rank of 1. A 6 appears in the second, third, and fourth positions on our ordered list. Once again, we assign each one a rank equivalent to the average of their positions, or a rank of
We also know that a rank of 5 should be used for 7, since 7 is in the fifth position in the list, and a rank of 6 should be used for 10, since it is in the sixth position. The - and -values and their ranks are shown below, along with the differences in the ranks and the squares of the differences. The ranks of the -values are represented by , while the ranks of the -values are represented by .
4 | 7 | 8 | 5 | 8 | 12 | |
1 | 3 | 4.5 | 2 | 4.5 | 6 | |
7 | 6 | 6 | 4 | 6 | 10 | |
5 | 3 | 3 | 1 | 3 | 6 | |
0 | 1.5 | 1 | 1.5 | 0 | ||
16 | 0 | 2.25 | 1 | 2.25 | 0 |
We must substitute values for and into the formula to find Spearman’s rank correlation coefficient. Here, we know that the value of is 6, since there are 6 data pairs, and the value of is
Thus, the value of Spearman’s rank correlation coefficient is
Correct to three decimal places, the value of the coefficient is 0.386. This is far from 1, so we can say that the ranks are not in strong agreement. Thus, we can conclude that higher values of the variable do not tend to be associated with higher values of the variable and vice versa.
In our previous two examples, we learned how to calculate Spearman’s rank correlation coefficient for a set of quantitative data. We can apply the same techniques to qualitative data by first assigning ranks to our data values. Just as before, in this example, we will calculate tied ranks.
Example 5: Calculating Spearman’s Rank Correlation Coefficient for Qualitative Data
In a study of the relation between students’ grades in mathematics and science, the following results were found for six students.
Mathematics | D | B | A | B | D | D |
---|---|---|---|---|---|---|
Science | C | C | B | A | C | F |
Find the Spearman’s correlation coefficient. Round your answer to three decimal places.
Answer
Recall that the formula for Spearman’s correlation coefficient is , where represents the coefficient, is the number of data pairs, and is the square of the difference in the ranks of the two variables for each data pair.
We see that the six students’ grades make up a set of qualitative, bivariate data, which can be ordered. Even though the data values are not numerical, we can still assign a rank to each of them, which will allow us to find the differences in their ranks. First, let’s assign ranks to the mathematics grades. Putting the grades in order from highest to lowest gives us
The highest grade is A, so we can assign an A the lowest rank or the highest rank. We should arrive at the same value for Spearman’s rank correlation coefficient either way, as long as we rank the grades for mathematics and science in a similar fashion. Here, we’ll use a rank of 1 for an A.
Since there are Bs in both the second and third positions in the ordered list of grades, we know that each of the Bs should have a rank that is equal to the average of 2 and 3, or a rank of
Since ranks 2 and 3 are now taken, the next rank is 4, and since there are Ds in the fourth, fifth, and sixth positions in the list, we can assign each one a rank equivalent to the average of their positions, or a rank of
Now let’s assign ranks to the science grades. Putting the grades in order from highest to lowest gives us
Here, an A gets a rank of 1 and a B gets a rank of 2. Since there are Cs in the third, fourth, and fifth positions in the ordered list of grades, we know that each of the Cs should have a rank of or a rank that is equal to the average of 3, 4, and 5. A grade of F is last in the list, so we can assign it a rank of 6.
The grades and their ranks are shown below, along with the differences in the ranks and the squares of the differences. The ranks of the mathematics grades are represented by , while the ranks of the science grades are represented by .
Mathematics | D | B | A | B | D | D |
---|---|---|---|---|---|---|
5 | 2.5 | 1 | 2.5 | 5 | 5 | |
Science | C | C | B | A | C | F |
4 | 4 | 2 | 1 | 4 | 6 | |
1 | 1.5 | 1 | ||||
1 | 2.25 | 1 | 2.25 | 1 | 1 |
To find Spearman’s rank correlation coefficient, we must substitute values for and into the formula . Here, we know that the value of is 6, since there are 6 data pairs, and the value of is
Thus, the value of Spearman’s rank correlation coefficient is
Correct to three decimal places, the value of the coefficient is 0.757. This is close to 1, so we can say that the ranks are in fairly strong agreement. Thus, we can conclude that students with high grades in math also tend to have high grades in science and vice versa.
Now let’s look at another example involving qualitative data with tied ranks.
Example 6: Calculating Spearman’s Rank Correlation Coefficient for Qualitative Data
Using the information given in the table, find the Spearman’s rank correlation between the variables and . Give your answer to four decimal places.
Good | Excellent | Good | Excellent | Excellent | Excellent | |
Poor | Good | Poor | Excellent | Very Good | Good |
Answer
To find Spearman’s rank correlation between the variables, we will use the formula , where represents Spearman’s rank correlation coefficient, is the number of data pairs, and is the square of the difference in the ranks of the two variables for each data pair.
We see that the ratings make up a set of qualitative, bivariate data, which can be ordered. We will begin by assigning a rank to each of the pieces of data, starting with the -values. Putting the values in order from best to worst gives us
Since “Excellent” is in positions 1, 2, 3, and 4, we assign each a rank equivalent to the average of their positions, or a rank of
Also, since “Good” is in the fifth and sixth positions, we can assign each one a rank of
Now, let’s assign ranks to the -values. Putting the values in order from best to worst gives us
Here, “Excellent” gets a rank of 1, and “Very Good” gets a rank of 2. Since “Good” is in the third and fourth positions in the ordered list, we know that each one gets a rank of
Also, since “Poor” is in the fifth and sixth positions, we can assign each one a rank of
The ratings and their ranks are shown below, along with the differences in the ranks and the squares of the differences. The ranks of the ratings for the variable are represented by , while the ranks of the ratings for the variable are represented by .
Good | Excellent | Good | Excellent | Excellent | Excellent | |
5.5 | 2.5 | 5.5 | 2.5 | 2.5 | 2.5 | |
Poor | Good | Poor | Excellent | Very Good | Good | |
5.5 | 3.5 | 5.5 | 1 | 2 | 3.5 | |
0 | 0 | 1.5 | 0.5 | |||
0 | 1 | 0 | 2.25 | 0.25 | 1 |
Now we must substitute values for and into the formula to find Spearman’s rank correlation coefficient. Since there are data pairs, we will let . Also, since is the sum of the squares of the differences in our table, then
Thus, the value of Spearman’s rank correlation coefficient is
Correct to four decimal places, the value of the coefficient is 0.8714. This is quite close to 1, so we can say that the ranks are in strong agreement. Thus, we can conclude that better ratings for the variable tend to be associated with better ratings for the variable and vice versa.
The example that follows also involves qualitative data with tied ranks. We will again calculate Spearman’s rank correlation coefficient to determine the level of association between the variables.
Example 7: Calculating Spearman’s Rank Correlation Coefficient for Qualitative Data
The following table represents the relation between the results of employees’ appraisals this year and last year.
Last Year | Meets expectations | Needs improvement | Exceptional | Meets expectations | Exceeds expectations |
---|---|---|---|---|---|
This Year | Exceeds expectations | Meets expectations | Exceptional | Needs improvement | Exceeds expectations |
Find the Spearman’s correlation coefficient between the results of the last year and current year.
Answer
Remember that the formula for Spearman’s correlation coefficient is , where represents the coefficient, is the number of data pairs, and is the square of the difference in the ranks of the two variables for each data pair.
We see that the results make up a set of qualitative, bivariate data, which can be ordered. We can assign a rank to each of the data values, starting with last year’s results. Putting the results in order from worst to best gives us
We can assign “Needs improvement” a rank of 1. Since “Meets expectations” is in the second and third positions in the ordered list, we know that each one is assigned a rank equivalent to the average of their positions, or a rank of “Exceeds expectations” will then get a rank of 4 and “Exceptional” a rank of 5.
Now, let’s assign ranks to this year’s results. Putting the results in order from worst to best gives us
We can assign “Needs improvement” a rank of 1 and “Meets expectations” a rank of 2. Since “Exceeds expectations” is in the third and fourth positions in the ordered list, we can assign each one a rank equivalent to the average of their positions, or a rank of “Exceptional” will then get a rank of 5. The results and their ranks are shown below, along with the differences in the ranks and the squares of the differences. The ranks of last year’s appraisals are represented by , while the ranks of this year’s appraisals are represented by .
Last Year | Meets expectations | Needs improvement | Exceptional | Meets expectations | Exceeds expectations |
---|---|---|---|---|---|
2.5 | 1 | 5 | 2.5 | 5 | |
This Year | Exceeds expectations | Meets expectations | Exceptional | Needs improvement | Exceeds expectations |
3.5 | 2 | 5 | 1 | 3.5 | |
0 | 1.5 | 0.5 | |||
1 | 1 | 0 | 2.25 | 0.25 |
Next, let’s substitute values for and into the formula to find Spearman’s rank correlation coefficient. Here, we know that the value of is 5, since there are 5 data pairs, and the value of is the sum of the values in the final row of our table, or
Thus, the value of Spearman’s rank correlation coefficient is
A Spearman’s coefficient of 0.775 is close to 1, so we can say that the ranks are in fairly strong agreement. Thus, we can conclude that employees with better appraisals last year tend to have better appraisals this year and vice versa.
Now, let’s finish by recapping some key points.
Key Points
- Bivariate data is data on each of two variables, with each value of one of the variables paired with a value of the other variable.
- Spearman’s rank correlation coefficient is a measure of association for bivariate data. A positive Spearman’s rank correlation coefficient indicates a direct association, and a negative coefficient indicates an inverse association.
- The formula for Spearman’s rank correlation coefficient is , where is the coefficient, is the number of data points, and is the square of the difference in the ranks of the two coordinates for each point .
- The ranks of two or more identical data values for a variable are equal to the average of their places in an ordered list. The data values are said to have tied ranks.
- When calculating Spearman’s rank correlation coefficient, the differences will always sum to 0.