# Question Video: Calculating Spearman’s Correlation Coefficient for a Bivariate Dataset Mathematics

In a study to discover the relationship between the age of a mother and the number of her children, the following data were found. Find Spearman’s correlation coefficient. Round your answer to three decimal places.

05:30

### Video Transcript

In a study to discover the relationship between the age of a mother and the number of her children, the following data were found. Find Spearman’s correlation coefficient. Round your answer to three decimal places.

In our table, we see two rows of data. In the first row, there’s the age of the mother in years and in the second row the number of her children. Each row represents a different variable in the data. And since there are two rows, we can say that this is a bivariate data set. For just such a data set, Spearman’s correlation coefficient can help us understand the set better. The way it does this is not by directly analyzing the data itself, but rather the relative rank of the respective data points.

We can rank these data by choosing to let lower numbers correspond to lower ranks. Just as an aside, we could choose the opposite trend that higher numbers would represent lower ranks. And that would work perfectly well too for calculating Spearman’s correlation coefficient. Whichever way we choose, the important thing is that we apply the same method to both rows of our data.

To simplify our process, we’ll let lower numbers in our data table correspond with lower ranks. To discover these ranks, we’ll add two more rows to our table. We’ll let 𝑅 sub 𝐴 be the rank of the mother’s age and 𝑅 sub 𝐶 be the rank of the number of children. When it comes to the rank of the mother’s age, we see that this process is fairly straightforward because the mother’s age increases from left to right. To the age of 19 then, we give a rank of one. To the age of 22, we give a rank of two, and so on. Since each mother’s age in this table is unique, there aren’t any duplicates. Our mother’s age rank will increase from one to eight, left to right.

Now, let’s rank the number of children. We can see right away that this ranking won’t be quite so simple. For example, we have two mothers with one child. Since these are the lowest numbers of children, we’d like to give them both a rank of one, or since there are two of them, a rank of one and two. But actually neither of those two methods makes complete sense.

A better method is to solve for the average of the rankings of these two separate numbers of children. They occupy the first and second ranking. And we know the average of one and two is 1.5. So that’s the result we’ll fill in for their ranking.

We then look for the next lowest number of children. And we see that once again there’s a duplicate. Two mothers have two children. These would have rankings of three and four. But since they’re the same number of children, we’ll use the average of their ranking. The average of three and four is three and a half.

Going to the next lowest number of children, we see that there are two mothers with three children. Separately, these would have rankings of five and six. And if we take the average of these two ranks, we get 5.5.

The next lowest number of children in our table is four. And we see that there is only one instance of four children. So this has the seventh rank. And lastly, five children gets a rank of eight.

It’s the numbers in these two rows, the rankings of the mother’s age and the children’s age, that are compared using Spearman’s correlation coefficient. Effectively, the coefficient measures how closely matched these rankings are across the two different variables.

We can create a measure of that difference, 𝑑 sub 𝑖. It equals the difference between the rank of the mother’s age and the number of children. To calculate 𝑑 sub 𝑖 for a given data point, we subtract 𝑅 sub 𝐶 from 𝑅 sub 𝐴. One minus 3.5 is negative 2.5. Then, two minus 1.5 is 0.5. Three minus 1.5 is 1.5, and so on down the row.

As an aside, notice that there’s just one data point for which the ranking of the mother’s age agrees perfectly with the ranking of number of children. In general, the more often this happens for a data set, the more often that the ranking of the two different variables involved agrees, the closer Spearman’s correlation coefficient is to positive one.

Regarding that coefficient, let’s clear some space at the top of our screen and write out the mathematical equation for this coefficient. Often symbolized 𝑟 sub 𝑠, it equals one minus six times the sum, that’s what this Greek letter Σ means, of the differences between the rankings of our two variables squared all divided by 𝑛, where 𝑛 is the number of data points, times 𝑛 squared minus one.

We see then that to calculate 𝑟 sub 𝑠, we need to know the sum of 𝑑 sub 𝑖 squared and 𝑛. 𝑛 is the number of data points. So in our case, we have one, two, three, four, five, six, seven, eight of those. So 𝑛 equals eight. And to calculate the sum of 𝑑 sub 𝑖 squared, let’s make one last row to our table. In this row, we’re going to calculate 𝑑 sub 𝑖 squared for each 𝑑 sub 𝑖 value. By way of example, for our first data point, negative 2.5 squared gives us 6.25. Then 0.5 squared gives us 0.25, and so on down the line.

Note that since we’re squaring our 𝑑 sub 𝑖 values to get this row, all of these values are nonnegative. To calculate the sum of 𝑑 sub 𝑖 squared, we take all the values in this row and we add them together. Doing that gives us 12.5. Knowing this, we can write an expression for our correlation coefficient using our sum of 𝑑 sub 𝑖 squared and 𝑛-values. Entering this expression on our calculator, we find a result of 0.85119 and so on.

We recall though that we want to give our answer to three decimal places. If we look at the number in the fourth decimal place, we see that it’s one, which is less than five, which means that rounding to three decimal places will keep this digit the same. Spearman’s correlation coefficient then rounded to three decimal places is 0.851.