# Video: EG17S1-STATISTICS-Q07

The following table shows the number of units of a certain đť‘Ą, and the production cost per unit, đť‘¦, as produced in seven different factories in Egyptian pounds. Calculate the value of Spearmanâ€™s rank correlation coefficient between đť‘Ą and đť‘¦. Determine the type of correlation.

07:57

### Video Transcript

The following table shows the number of units of a certain đť‘Ą and the production cost per unit đť‘¦ as produced in seven different factories in Egyptian pounds. Calculate the value of Spearmanâ€™s rank correlation coefficient between đť‘Ą and đť‘¦. Determine the type of correlation.

Spearmanâ€™s rank correlation coefficient is a way of quantifying the degree of correlation between the ranks of two variables. It measures the tendency for one variable to increase as the other does, but not necessarily in a linear way. The formula for calculating Spearmanâ€™s rank correlation coefficient is this, one minus six multiplied by the sum of đť‘‘đť‘– squared over đť‘› multiplied by đť‘› squared minus one. Here, đť‘› represents the number of pairs of data.

So in this question, đť‘› will be seven, as there are seven pairs of data given in the table. đť‘‘đť‘– means the difference in the ranks of the đť‘–th pair of data, that is, the pair of data đť‘Ąđť‘–, đť‘¦đť‘–. For example, the first pair of data is đť‘Ą one đť‘¦ one. And the second is đť‘Ą two đť‘¦ two and so on. Before we can apply the Spearmanâ€™s rank correlation coefficient formula, we must first rank the data ourselves. It doesnâ€™t matter whether we choose the rank one to be awarded to the smallest or the largest data value as long as weâ€™re consistent about what we do for the two variables.

We can add two more rows to our table to fill in the đť‘Ą rank and the đť‘¦ rank. And letâ€™s choose to assign rank one to the smallest piece of data in each case. For đť‘Ą then, the smallest piece of data is 600, so this gets rank one; then 700, which gets rank two; then 1400, which gets rank three. There are then two equal pieces of data. The second and seventh values in the đť‘Ą row are both 1500. Now, we have to decide how to assign the ranks in this instance. In an ordered list of the data values, these would take up the fourth and fifth places in the list. Therefore, we choose to assign both pieces of data the same rank of 4.5. Thatâ€™s the average of four and five. We then continue.

The next piece of data is 2000 which would be the sixth value in our ordered list. So it gets rank six. And finally, 2500 is the greatest value so it gets the rank seven. We then assign the ranks for the đť‘¦ variable in the same way. But we notice straight away that there are two pieces of data which are both equal to 20, the smallest value. These would be the first and second values in an ordered list of the đť‘¦ data. So we assign them both the rank of 1.5. Thatâ€™s the average of one and two.

The next smallest piece of data is 23 which gets the rank three because it would be the third value in an ordered list. And then we notice that there are two values both equal to 24 which would be the fourth and fifth values in an ordered list of the đť‘¦ variable. So they both get rank 4.5. Thatâ€™s the average of four and five. 25 then gets rank six. And finally, 30 gets rank seven. This method of dealing with the tied ranks by awarding an average rank to both pieces of data is appropriate in this case as there are only a small number of tied ranks. There are other methods that we could consider such as Kendallâ€™s tab. But the method weâ€™ve used is fine in this instance.

Next, we need to work out the difference in the ranks awarded to each pair of data. It doesnâ€™t matter whether we subtract the rank awarded to đť‘Ą from the rank awarded đť‘¦ or vice versa as long as weâ€™re consistent about what we do for every pair. Letâ€™s choose to subtract the ranks of đť‘¦ from the ranks of đť‘Ą. First, we have one minus seven which is equal to negative six, then 4.5 minus 4.5 which gives zero. We now work out all the other differences in the same way, giving negative 1.5, negative four, 4.5, 5.5, and 1.5. At this point we can perform a quick check about work so far because it should always be the case that the sum of these differences is equal to zero. If we add up negative six, zero, negative 1.5, negative four, 4.5, 5.5, and 1.5, we do indeed get zero. So this helps us be confident that the work weâ€™ve done so far is correct.

Finally, we need to work out the squares of these differences. And this is why it doesnâ€™t actually matter which way around we subtract the ranks because weâ€™re going to end up squaring the differences anyway. Negative six squared gives 36. Zero squared gives zero. We square all the remaining differences in the same way, giving 2.25, 16, 20.25, 30.25, and 2.25.

Now, weâ€™re nearly ready to apply our Spearmanâ€™s rank correlation coefficient formula. But first, we need to work out the sum of the squared differences. Thatâ€™s the sum of the seven values in the final row of our table. Adding all these values up gives 107. Now substituting the relevant values into our formula for Spearmanâ€™s rank correlation coefficient then, the sum of đť‘‘đť‘– squared is 107. And the value of đť‘› is seven. So we have one minus six multiplied by 107 over seven multiplied by seven squared minus one. Seven squared is 49. And subtracting one gives 48. In the numerator, six multiplied by 107 is 642. And in the denominator, seven multiplied by 48 is 336. So we have one minus 642 over 336.

Evaluating this on a calculator and converting our answer to a decimal gives negative 0.9107 and then the decimal continues. We havenâ€™t been asked to give our answer to a particular degree of accuracy. So letâ€™s use three significant figures. In this case, the fourth significant figure is the seven. And as this is greater than five, it tells us that weâ€™re rounding up. So the zero in the third decimal place will round up to become a one. Weâ€™ve calculated the value of đť‘ź then to be negative 0.911 correct to three significant figures.

Now, the question also asked to determine the type of correlation which means we need to interpret what this value of đť‘ź tells us about đť‘Ą and đť‘¦. To do so, we recall that this correlation coefficient always takes a value between negative one and one inclusive. A value of positive one means that there is perfect positive rank agreement between đť‘Ą and đť‘¦ which means that the smallest value of đť‘Ą is paired with the smallest value of đť‘¦. The second smallest value of đť‘Ą is paired with the second smallest value of đť‘¦ and so on, all the way up to the largest value of đť‘Ą being paired with the largest value of đť‘¦. A value of negative one means there is perfect negative rank correlation between đť‘Ą and đť‘¦ which means the opposite. The smallest value of đť‘Ą is paired with the largest value of đť‘¦ and vice versa.

In this case, our value of negative 0.911 is pretty close to negative one which means that there is strong negative rank correlation between đť‘Ą and đť‘¦. This makes sense if we consider the context of this problem. The larger number of units you produce, the more efficient this will be. And so the production cost per unit will be lower. We have our answer to the problem then. The value of the Spearmanâ€™s rank correlation coefficient to three significant figures is negative 0.911. And we conclude that there is a strong negative rank correlation between đť‘Ą and đť‘¦.