Video Transcript
In a study to discover
the relationship between the age of a mother and the number of her children, the
following data were found. Find Spearman’s correlation
coefficient. Round your answer to three decimal
places.
In our table, we see two rows of
data. In the first row, there’s the age
of the mother in years and in the second row the number of her children. Each row represents a different
variable in the data. And since there are two rows, we
can say that this is a bivariate data set. For just such a data set,
Spearman’s correlation coefficient can help us understand the set better. The way it does this is not by
directly analyzing the data itself, but rather the relative rank of the respective
data points.
We can rank these data by choosing
to let lower numbers correspond to lower ranks. Just as an aside, we could choose
the opposite trend that higher numbers would represent lower ranks. And that would work perfectly well
too for calculating Spearman’s correlation coefficient. Whichever way we choose, the
important thing is that we apply the same method to both rows of our data.
To simplify our process, we’ll let
lower numbers in our data table correspond with lower ranks. To discover these ranks, we’ll add
two more rows to our table. We’ll let 𝑅 sub 𝐴 be the rank of
the mother’s age and 𝑅 sub 𝐶 be the rank of the number of children. When it comes to the rank of the
mother’s age, we see that this process is fairly straightforward because the
mother’s age increases from left to right. To the age of 19 then, we give a
rank of one. To the age of 22, we give a rank of
two, and so on. Since each mother’s age in this
table is unique, there aren’t any duplicates. Our mother’s age rank will increase
from one to eight, left to right.
Now, let’s rank the number of
children. We can see right away that this
ranking won’t be quite so simple. For example, we have two mothers
with one child. Since these are the lowest numbers
of children, we’d like to give them both a rank of one, or since there are two of
them, a rank of one and two. But actually neither of those two
methods makes complete sense.
A better method is to solve for the
average of the rankings of these two separate numbers of children. They occupy the first and second
ranking. And we know the average of one and
two is 1.5. So that’s the result we’ll fill in
for their ranking.
We then look for the next lowest
number of children. And we see that once again there’s
a duplicate. Two mothers have two children. These would have rankings of three
and four. But since they’re the same number
of children, we’ll use the average of their ranking. The average of three and four is
three and a half.
Going to the next lowest number of
children, we see that there are two mothers with three children. Separately, these would have
rankings of five and six. And if we take the average of these
two ranks, we get 5.5.
The next lowest number of children
in our table is four. And we see that there is only one
instance of four children. So this has the seventh rank. And lastly, five children gets a
rank of eight.
It’s the numbers in these two rows,
the rankings of the mother’s age and the children’s age, that are compared using
Spearman’s correlation coefficient. Effectively, the coefficient
measures how closely matched these rankings are across the two different
variables.
We can create a measure of that
difference, 𝑑 sub 𝑖. It equals the difference between
the rank of the mother’s age and the number of children. To calculate 𝑑 sub 𝑖 for a given
data point, we subtract 𝑅 sub 𝐶 from 𝑅 sub 𝐴. One minus 3.5 is negative 2.5. Then, two minus 1.5 is 0.5. Three minus 1.5 is 1.5, and so on
down the row.
As an aside, notice that there’s
just one data point for which the ranking of the mother’s age agrees perfectly with
the ranking of number of children. In general, the more often this
happens for a data set, the more often that the ranking of the two different
variables involved agrees, the closer Spearman’s correlation coefficient is to
positive one.
Regarding that coefficient, let’s
clear some space at the top of our screen and write out the mathematical equation
for this coefficient. Often symbolized 𝑟 sub 𝑠, it
equals one minus six times the sum, that’s what this Greek letter Σ means, of the
differences between the rankings of our two variables squared all divided by 𝑛,
where 𝑛 is the number of data points, times 𝑛 squared minus one.
We see then that to calculate 𝑟 sub
𝑠, we need to know the sum of 𝑑 sub 𝑖 squared and 𝑛. 𝑛 is the number of data
points. So in our case, we have one, two,
three, four, five, six, seven, eight of those. So 𝑛 equals eight. And to calculate the sum of 𝑑 sub
𝑖 squared, let’s make one last row to our table. In this row, we’re going to
calculate 𝑑 sub 𝑖 squared for each 𝑑 sub 𝑖 value. By way of example, for our first
data point, negative 2.5 squared gives us 6.25. Then 0.5 squared gives us 0.25, and
so on down the line.
Note that since we’re squaring our
𝑑 sub 𝑖 values to get this row, all of these values are nonnegative. To calculate the sum of 𝑑 sub 𝑖
squared, we take all the values in this row and we add them together. Doing that gives us 12.5. Knowing this, we can write an
expression for our correlation coefficient using our sum of 𝑑 sub 𝑖 squared and
𝑛-values. Entering this expression on our
calculator, we find a result of 0.85119 and so on.
We recall though that we want to
give our answer to three decimal places. If we look at the number in the
fourth decimal place, we see that it’s one, which is less than five, which means
that rounding to three decimal places will keep this digit the same. Spearman’s correlation coefficient
then rounded to three decimal places is 0.851.