Lesson Explainer: Spearman’s Rank Correlation Coefficient | Nagwa Lesson Explainer: Spearman’s Rank Correlation Coefficient | Nagwa

Lesson Explainer: Spearman’s Rank Correlation Coefficient Mathematics

In this explainer, we will learn how to find Spearman’s rank correlation coefficient.

We will find its value for sets of both quantitative and qualitative bivariate data. The data described by Spearman’s rank correlation coefficient can be either discrete or continuous if it is quantitative.

Definition: Bivariate Data

Bivariate data is data on each of two variables, with each value of one of the variables paired with a value of the other variable.

Definition: Quantitative and Qualitative Data

Quantitative data is numerical. An example of a set of quantitative bivariate data is {(11,130),(13,129),(9,124),(15,138),(7,121)}. This might describe a person’s age in years and the person’s height in centimetres. A person’s age is discrete if it can be given only as a whole number of years, while the person’s height is continuous if it can be given as any fraction of a centimetre.

Qualitative data (also referred to as descriptive or categorical data) is not numerical. An example of a set of qualitative bivariate data is {(large, large), (medium, large), (small, small), (medium, medium), (large, medium)}. This might describe a person’s shirt size in two different brands. To calculate Spearman’s rank correlation coefficient for qualitative data, the data must be able to be ordered (e.g., small, medium, large).

Spearman’s rank correlation coefficient, denoted by 𝑟, is a measure of the tendency for one variable to increase or decrease as the other does within a monotonic (entirely increasing or entirely decreasing) relationship, such that 1𝑟1.

If one variable always increases as the other does, we can say that the value of 𝑟 is positive and there is a direct association between the variables. On the other hand, if one variable always decreases as the other increases, then we can say that the value of 𝑟 is negative and this indicates an inverse association. Rank correlation coefficient values of 1 or 1 describe a perfectly associated monotonic relationship. This means that either the ranks agree entirely (𝑟=1) or they are direct opposites (𝑟=1). Unlike with Pearson’s correlation coefficient, a perfect 𝑟 value of 1 or 1 can occur regardless of whether the quantitative data pairs in a set are linearly related or not.

Not only can 𝑟 be 1 or 1, but also it can have any value between 1 and 1. A value of 0 for 𝑟 indicates no association between the variables. The closer the value of 𝑟 is to 1 or 1, the stronger the association, and the closer it is to 0, the weaker the association.

Definition: Spearman’s Rank Correlation Coefficient

Spearman’s rank correlation coefficient, denoted by 𝑟, is a numerical value such that 1𝑟1. It gives a measure of the likelihood of one variable increasing as the other increases (a direct association) or of one variable decreasing as the other increases (an inverse association). Direct associations are indicated by positive values, and inverse associations are indicated by negative values. No association is indicated by a value of 0. The stronger the association, the closer 𝑟 is to 1 or 1, and the weaker the association, the closer it is to 0. Rank correlation coefficient values of 1 or 1 mean that either the ranks agree entirely (𝑟=1) or they are direct opposites (𝑟=1).

Our first step in determining the value of 𝑟 for a set of 𝑛 bivariate data pairs is to rank the values of each variable. In a quantitative data set, the smallest rank for a variable can be assigned to either the least or the greatest data value, but each variable must be ranked in the same way. That is, both must be ranked either from least to greatest or from greatest to least. Also, if two data values are the same, then their ranks must also be the same. Thus, the ranks of two or more identical data values are equal to the average of their places in an ordered list. The identical data values are said to have tied ranks.

Suppose we have a data set consisting of the points (2,3),(5,4),(6,1).and

The two variables are referred to as 𝑋, with sample values 𝑥,𝑥,,𝑥, and 𝑌, with sample values 𝑦,𝑦,,𝑦, such that a general bivariate item is denoted (𝑥,𝑦). In this data set, 𝑖=1,2,3, since there are 3 data pairs. The values of 𝑥 are 2, 5, and 6, while the values of 𝑦 are 3, 4, and 1. Putting the values of 𝑥 in order from least to greatest gives us 6,2,5.

Doing the same for the values of 𝑦, we get 3,1,4.

This means that for, the values of 𝑥, if the rank of 6 is 1, then the rank of 2 is 2, and the rank of 5 is 3. The values of 𝑦 must be ranked in the same way, so the rank of 3 is 1, the rank of 1 is 2, and the rank of 4 is 3. For each point (𝑥,𝑦), the difference in the coordinates’ ranks can be denoted as 𝑑 and the squares of the differences as 𝑑. This is shown in the table below, where the ranks of the values of 𝑥 are represented by R, and the ranks of the values of 𝑦 are represented by R.

𝑥256
R231
𝑦341
R132
RR(𝑑)101
𝑑101

Once we have the values of 𝑑, we can use them, along with the value of 𝑛, or the number of data pairs, in a general formula for Spearman’s rank correlation coefficient. In our first example, we will learn to recognize what that formula is.

Example 1: Recognizing the Formula for Spearman’s Rank Correlation Coefficient

Which of the following is the formula for Spearman’s rank correlation coefficient?

  1. 𝑟=16𝑑(𝑛1)
  2. 𝑟=16𝑑𝑛(𝑛1)
  3. 𝑟=6𝑑𝑛(𝑛1)
  4. 𝑟=16𝑑𝑛(𝑛1)
  5. 𝑟=1𝑑𝑛(𝑛1)

Answer

The formula for Spearman’s rank correlation coefficient (sometimes simply referred to as rank correlation) is 𝑟=16𝑑𝑛(𝑛1) In it, 𝑟 represents the coefficient, and the number of points in the data set is represented by 𝑛. The square of the difference in the ranks of the two coordinates for each point (𝑥,𝑦) is represented by 𝑑, and the expression 𝑑 indicates that we should find the sum of each of these squares. The formula was developed by Charles Spearman, an English psychologist known for his work in statistics. Calculating the rank correlation is equivalent to finding Pearson’s correlation on a new set of variables: the ranked values of the data.

Formula: Spearman’s Rank Correlation Coefficient

The formula for Spearman’s rank correlation coefficient is 𝑟=16𝑑𝑛(𝑛1), where 𝑟 is the coefficient and 𝑛 is the number of points in the data set. For each point (𝑥,𝑦), the square of the difference in the ranks of the two coordinates is represented by 𝑑, and the sum of each of these squares is represented by the expression 𝑑.

Now that we have a general formula, we can use it to solve problems. Let’s begin by considering what the value of Spearman’s rank correlation coefficient will be when the corresponding elements in two groups of data have the same ranks.

Example 2: Determining When Spearman’s Rank Correlation Coefficient is Equal to 1

True or False: When the ranks of each two corresponding elements in two groups of data 𝑋 and 𝑌 are identical, Spearman’s rank correlation coefficient is equal to 1.

Answer

To help answer the question, let’s look at a real-world example. Suppose that, in the table below, R and R represent the ranks given to five dogs at a dog show by judges 𝑋 and 𝑌, with 1 being the top-ranked dog and 5 being the bottom-ranked dog. The ranks of the two judges are identical, so we can see that the difference in the ranks for each dog, which is represented by RR, or 𝑑, is equal to 0. Since 0=0, we can also see that 𝑑 is equal to 0 for each dog.

DogRank of
Judge 𝑋()R
Rank of
Judge 𝑌()R
RR(𝑑)𝑑
Dachshund2222=00
St. Bernard4444=00
Beagle1111=00
Irish Setter5555=00
Poodle3333=00

Remember that the formula for Spearman’s rank correlation coefficient is 𝑟=16𝑑𝑛(𝑛1), where 𝑟 represents the coefficient, 𝑛 is the number of data pairs, and 𝑑 is the square of the difference in the ranks of the two variables for each data pair.

Here, we know that the value of 𝑛 is 5, since there are 5 data pairs, and the value of 𝑑 is 0+0+0+0+0=0.

Thus, the value of Spearman’s rank correlation coefficient is 𝑟=16(0)5(51)=16(0)5(251)=16(0)5(24)=10120=10=1.

Not only is the value of Spearman’s rank correlation coefficient equal to 1 in this example, but it will also be equal to 1 in any example we look at in which the ranks of the two variables are identical. This is because, in the formula, the value of 𝑑, and subsequently the value of the fraction 6𝑑𝑛(𝑛1), will always be equal to 0, and 10=1. Therefore, we can say that it is true that when the ranks of each two corresponding elements in two groups of data 𝑋 and 𝑌 are identical, Spearman’s rank correlation coefficient is equal to 1.

Next, let’s look at some more problems in which we must find Spearman’s rank correlation coefficient for a set of qualitative or quantitative bivariate data. In these problems, the ranks will not be given. We will look at quantitative data first.

Example 3: Calculating Spearman’s Rank Correlation Coefficient for Quantitative Data

Find the Spearman’s rank correlation coefficient between the product price and its lifetime from the given data. Round your answer to four decimal places.

Lifetime (yr)154263
Price ($)79160125105214103

Answer

Recall that the formula for Spearman’s correlation coefficient is 𝑟=16𝑑𝑛(𝑛1), where 𝑟 represents the coefficient, 𝑛 is the number of data pairs, and 𝑑 is the square of the difference in the ranks of the two coordinates for each data pair.

We see that the lifetimes of the products and their prices make up a set of quantitative, bivariate data. First, let’s assign ranks to the products’ lifetimes. Putting the lifetimes in order from shortest to longest gives us 1,2,3,4,5,6.

The shortest lifetime is 1 year, so we can assign it the lowest rank (1) or the highest rank (6). We should arrive at the same value for Spearman’s rank correlation coefficient either way, as long as we rank the products’ lifetimes and prices in a similar fashion. Here, we will use a rank of 1 for a lifetime of 1 year, so a lifetime of 2 years will get a rank of 2, a lifetime of 3 years will get a rank of 3, and so on, with a lifetime of 6 years getting a rank of 6.

Now let’s repeat this process with the prices. Putting them in order from lowest to highest gives us 79,103,105,125,160,214.

Here, $79 gets a rank of 1, so $103 gets a rank of 2, $105 gets a rank of 3, and so on, with a price of $214 getting a rank of 6. The lifetimes and prices and their ranks are shown below, along with the differences in the ranks and the squares of the differences. The ranks of the lifetimes are represented by RL, while the ranks of the prices are represented by RP. Notice that the differences (𝑑) sum to 0+0+0+(1)+0+1=0.

In fact, it will always be the case that the sum of the differences in ranks is equal to zero. So finding the sum of the differences is a good way to check our work.

Lifetime (yr)154263
RL154263
Price ($)79160125105214103
RP154362
RRLP(𝑑)000101
𝑑000101

We must substitute values for 𝑛 and 𝑑 into the formula 𝑟=16𝑑𝑛(𝑛1) to find Spearman’s rank correlation coefficient. Here, we know that the value of 𝑛 is 6, since there are 6 data pairs, and the value of 𝑑 is 0+0+0+1+0+1=2.

Thus, the value of Spearman’s rank correlation coefficient is 𝑟=16(2)6(61)=16(2)6(35)=112210=10.057142=0.942858.

Correct to four decimal places, the value of the coefficient is 0.9429. This is quite close to 1, so we can say that the ranks are in strong agreement. Thus, we can conclude that longer lifetimes tend to be associated with higher prices and vice versa.

We will again determine Spearman’s rank correlation coefficient for a set of quantitative data in the example that follows. This time, we will calculate tied ranks. That is, the ranks associated with data points with the same value.

Example 4: Calculating Spearman’s Rank Correlation Coefficient for Quantitative Data

Find the Spearman’s correlation coefficient between 𝑥 and 𝑦. Round your answer to three decimal places.

𝑥4785812
𝑦7664610

Answer

To determine Spearman’s correlation coefficient, we will use the formula 𝑟=16𝑑𝑛(𝑛1), where 𝑟 represents the coefficient, 𝑛 is the number of data pairs, and 𝑑 is the square of the difference in the ranks of the two coordinates for each data pair.

We see that the table shows a set of quantitative bivariate data. First, let’s assign ranks to the 𝑥-values. Putting them in order from least to greatest gives us 4,5,7,8,8,12.

The smallest data value is 4, so we can choose to assign it either the lowest rank (1) or the highest rank (6). As long as we are consistent in the method we choose to assign ranks for both the 𝑥- and the 𝑦-values, we will arrive at the same value for Spearman’s rank correlation. Here, we will assign the number 4 the lowest rank: 1. This means 5 gets a rank of 2 and 7 a rank of 3.

Since 8 appears in both the fourth and fifth positions in our ordered list, we will assign each instance of 8 a rank equivalent to the average of their positions, or a rank of 4+52=92=4.5.

We also know that a rank of 6 should be used for 12, since 12 is in the sixth position in the list.

Now, let’s repeat this process with the 𝑦-values. Putting them in order from least to greatest gives us 4,6,6,6,7,10.

Here, 4 gets a rank of 1. A 6 appears in the second, third, and fourth positions on our ordered list. Once again, we assign each one a rank equivalent to the average of their positions, or a rank of 2+3+43=93=3.

We also know that a rank of 5 should be used for 7, since 7 is in the fifth position in the list, and a rank of 6 should be used for 10, since it is in the sixth position. The 𝑥- and 𝑦-values and their ranks are shown below, along with the differences in the ranks and the squares of the differences. The ranks of the 𝑥-values are represented by R, while the ranks of the 𝑦-values are represented by R.

𝑥4785812
R134.524.56
𝑦7664610
R533136
RR(𝑑)401.511.50
𝑑1602.2512.250

We must substitute values for 𝑛 and 𝑑 into the formula 𝑟=16𝑑𝑛(𝑛1) to find Spearman’s rank correlation coefficient. Here, we know that the value of 𝑛 is 6, since there are 6 data pairs, and the value of 𝑑 is 16+0+2.25+1+2.25+0=21.5.

Thus, the value of Spearman’s rank correlation coefficient is 𝑟=16(21.5)6(61)=16(21.5)6(35)=1129210=10.614285=0.385714.

Correct to three decimal places, the value of the coefficient is 0.386. This is far from 1, so we can say that the ranks are not in strong agreement. Thus, we can conclude that higher values of the variable 𝑥 do not tend to be associated with higher values of the variable 𝑦 and vice versa.

In our previous two examples, we learned how to calculate Spearman’s rank correlation coefficient for a set of quantitative data. We can apply the same techniques to qualitative data by first assigning ranks to our data values. Just as before, in this example, we will calculate tied ranks.

Example 5: Calculating Spearman’s Rank Correlation Coefficient for Qualitative Data

In a study of the relation between students’ grades in mathematics and science, the following results were found for six students.

MathematicsDBABDD
ScienceCCBACF

Find the Spearman’s correlation coefficient. Round your answer to three decimal places.

Answer

Recall that the formula for Spearman’s correlation coefficient is 𝑟=16𝑑𝑛(𝑛1), where 𝑟 represents the coefficient, 𝑛 is the number of data pairs, and 𝑑 is the square of the difference in the ranks of the two variables for each data pair.

We see that the six students’ grades make up a set of qualitative, bivariate data, which can be ordered. Even though the data values are not numerical, we can still assign a rank to each of them, which will allow us to find the differences in their ranks. First, let’s assign ranks to the mathematics grades. Putting the grades in order from highest to lowest gives us ABBDDD,,,,,.

The highest grade is A, so we can assign an A the lowest rank or the highest rank. We should arrive at the same value for Spearman’s rank correlation coefficient either way, as long as we rank the grades for mathematics and science in a similar fashion. Here, we’ll use a rank of 1 for an A.

Since there are Bs in both the second and third positions in the ordered list of grades, we know that each of the Bs should have a rank that is equal to the average of 2 and 3, or a rank of 2+32=52=2.5.

Since ranks 2 and 3 are now taken, the next rank is 4, and since there are Ds in the fourth, fifth, and sixth positions in the list, we can assign each one a rank equivalent to the average of their positions, or a rank of 4+5+63=153=5.

Now let’s assign ranks to the science grades. Putting the grades in order from highest to lowest gives us ABCCCF,,,,,.

Here, an A gets a rank of 1 and a B gets a rank of 2. Since there are Cs in the third, fourth, and fifth positions in the ordered list of grades, we know that each of the Cs should have a rank of 3+4+53=123=4, or a rank that is equal to the average of 3, 4, and 5. A grade of F is last in the list, so we can assign it a rank of 6.

The grades and their ranks are shown below, along with the differences in the ranks and the squares of the differences. The ranks of the mathematics grades are represented by RM, while the ranks of the science grades are represented by RS.

MathematicsDBABDD
RM52.512.555
ScienceCCBACF
RS442146
RRMS(𝑑)11.511.511
𝑑12.2512.2511

To find Spearman’s rank correlation coefficient, we must substitute values for 𝑛 and 𝑑 into the formula 𝑟=16𝑑𝑛(𝑛1). Here, we know that the value of 𝑛 is 6, since there are 6 data pairs, and the value of 𝑑 is 1+2.25+1+2.25+1+1=8.5.

Thus, the value of Spearman’s rank correlation coefficient is 𝑟=16(8.5)6(61)=16(8.5)6(35)=151210=10.242857=0.757142.

Correct to three decimal places, the value of the coefficient is 0.757. This is close to 1, so we can say that the ranks are in fairly strong agreement. Thus, we can conclude that students with high grades in math also tend to have high grades in science and vice versa.

Now let’s look at another example involving qualitative data with tied ranks.

Example 6: Calculating Spearman’s Rank Correlation Coefficient for Qualitative Data

Using the information given in the table, find the Spearman’s rank correlation between the variables 𝑥 and 𝑦. Give your answer to four decimal places.

𝑥GoodExcellentGoodExcellentExcellentExcellent
𝑦PoorGoodPoorExcellentVery GoodGood

Answer

To find Spearman’s rank correlation between the variables, we will use the formula 𝑟=16𝑑𝑛(𝑛1), where 𝑟 represents Spearman’s rank correlation coefficient, 𝑛 is the number of data pairs, and 𝑑 is the square of the difference in the ranks of the two variables for each data pair.

We see that the ratings make up a set of qualitative, bivariate data, which can be ordered. We will begin by assigning a rank to each of the pieces of data, starting with the 𝑥-values. Putting the values in order from best to worst gives us ExcellentExcellentExcellentExcellentGoodGood,,,,,.

Since “Excellent” is in positions 1, 2, 3, and 4, we assign each a rank equivalent to the average of their positions, or a rank of 1+2+3+44=104=2.5.

Also, since “Good” is in the fifth and sixth positions, we can assign each one a rank of 5+62=112=5.5.

Now, let’s assign ranks to the 𝑦-values. Putting the values in order from best to worst gives us ExcellentVeryGoodGoodGoodPoorPoor,,,,,.

Here, “Excellent” gets a rank of 1, and “Very Good” gets a rank of 2. Since “Good” is in the third and fourth positions in the ordered list, we know that each one gets a rank of 3+42=72=3.5.

Also, since “Poor” is in the fifth and sixth positions, we can assign each one a rank of 5+62=112=5.5.

The ratings and their ranks are shown below, along with the differences in the ranks and the squares of the differences. The ranks of the ratings for the variable 𝑥 are represented by R, while the ranks of the ratings for the variable 𝑦 are represented by R.

𝑥GoodExcellentGoodExcellentExcellentExcellent
R5.52.55.52.52.52.5
𝑦PoorGoodPoorExcellentVery GoodGood
R5.53.55.5123.5
RR(𝑑)0101.50.51
𝑑0102.250.251

Now we must substitute values for 𝑛 and 𝑑 into the formula 𝑟=16𝑑𝑛(𝑛1) to find Spearman’s rank correlation coefficient. Since there are 𝑛 data pairs, we will let 𝑛=6. Also, since 𝑑 is the sum of the squares of the differences in our table, then 𝑑=0+1+0+2.25+0.25+1=4.5.

Thus, the value of Spearman’s rank correlation coefficient is 𝑟=16(4.5)6(61)=16(4.5)6(35)=127210=10.128571=0.871428.

Correct to four decimal places, the value of the coefficient is 0.8714. This is quite close to 1, so we can say that the ranks are in strong agreement. Thus, we can conclude that better ratings for the variable 𝑥 tend to be associated with better ratings for the variable 𝑦 and vice versa.

The example that follows also involves qualitative data with tied ranks. We will again calculate Spearman’s rank correlation coefficient to determine the level of association between the variables.

Example 7: Calculating Spearman’s Rank Correlation Coefficient for Qualitative Data

The following table represents the relation between the results of employees’ appraisals this year and last year.

Last YearMeets expectationsNeeds improvementExceptionalMeets expectationsExceeds expectations
This YearExceeds expectationsMeets expectationsExceptionalNeeds improvementExceeds expectations

Find the Spearman’s correlation coefficient between the results of the last year and current year.

Answer

Remember that the formula for Spearman’s correlation coefficient is 𝑟=16𝑑𝑛(𝑛1), where 𝑟 represents the coefficient, 𝑛 is the number of data pairs, and 𝑑 is the square of the difference in the ranks of the two variables for each data pair.

We see that the results make up a set of qualitative, bivariate data, which can be ordered. We can assign a rank to each of the data values, starting with last year’s results. Putting the results in order from worst to best gives us NeedsimprovementMeetsexpectationsMeetsexpectationsExceedsexpectationsExceptional,,,,.

We can assign “Needs improvement” a rank of 1. Since “Meets expectations” is in the second and third positions in the ordered list, we know that each one is assigned a rank equivalent to the average of their positions, or a rank of 2+32=52=2.5. “Exceeds expectations” will then get a rank of 4 and “Exceptional” a rank of 5.

Now, let’s assign ranks to this year’s results. Putting the results in order from worst to best gives us NeedsimprovementMeetsexpectationsExceedsexpectationsExceedsexpectationsExceptional,,,,.

We can assign “Needs improvement” a rank of 1 and “Meets expectations” a rank of 2. Since “Exceeds expectations” is in the third and fourth positions in the ordered list, we can assign each one a rank equivalent to the average of their positions, or a rank of 3+42=72=3.5. “Exceptional” will then get a rank of 5. The results and their ranks are shown below, along with the differences in the ranks and the squares of the differences. The ranks of last year’s appraisals are represented by RL, while the ranks of this year’s appraisals are represented by RT.

Last YearMeets expectationsNeeds improvementExceptionalMeets expectationsExceeds expectations
RL2.5152.55
This YearExceeds expectationsMeets expectationsExceptionalNeeds improvementExceeds expectations
RT3.52513.5
RRLT(𝑑)1101.50.5
𝑑1102.250.25

Next, let’s substitute values for 𝑛 and 𝑑 into the formula 𝑟=16𝑑𝑛(𝑛1) to find Spearman’s rank correlation coefficient. Here, we know that the value of 𝑛 is 5, since there are 5 data pairs, and the value of 𝑑 is the sum of the values in the final row of our table, or 1+1+0+2.25+0.25=4.5.

Thus, the value of Spearman’s rank correlation coefficient is 𝑟=16(4.5)5(51)=16(4.5)5(24)=127120=10.225=0.775.

A Spearman’s coefficient of 0.775 is close to 1, so we can say that the ranks are in fairly strong agreement. Thus, we can conclude that employees with better appraisals last year tend to have better appraisals this year and vice versa.

Now, let’s finish by recapping some key points.

Key Points

  • Bivariate data is data on each of two variables, with each value of one of the variables paired with a value of the other variable.
  • Spearman’s rank correlation coefficient is a measure of association for bivariate data. A positive Spearman’s rank correlation coefficient indicates a direct association, and a negative coefficient indicates an inverse association.
  • The formula for Spearman’s rank correlation coefficient is 𝑟=16𝑑𝑛(𝑛1), where 𝑟 is the coefficient, 𝑛 is the number of data points, and 𝑑 is the square of the difference in the ranks of the two coordinates for each point (𝑥,𝑦).
  • The ranks of two or more identical data values for a variable are equal to the average of their places in an ordered list. The data values are said to have tied ranks.
  • When calculating Spearman’s rank correlation coefficient, the differences (𝑑) will always sum to 0.

Download the Nagwa Classes App

Attend sessions, chat with your teacher and class, and access class-specific questions. Download Nagwa Classes app today!

Nagwa uses cookies to ensure you get the best experience on our website. Learn more about our Privacy Policy.