Suppose that the sum of 𝑥 equals six, the sum of 𝑦 equals 21, sum of 𝑥 squared equals 76, the sum of 𝑦 squared equals 91, the sum of 𝑥𝑦 equals 56, and 𝑛 equals six. Part a, find the correlation coefficient between the values of 𝑥 and 𝑦.
A correlation coefficient is a way of quantifying the strength of any linear relationship that exists between two variables. The product moment correlation coefficient or PMCC, which is often denoted using the letter 𝑟, is given by 𝑆𝑥𝑦 over the square root of 𝑆𝑥𝑥 multiplied by 𝑆𝑦𝑦, where 𝑆𝑥𝑥, 𝑆𝑦𝑦, and 𝑆𝑥𝑦 are all as given below. Don’t worry about attempting to learn this formula off by heart. You will most likely be given it in a formula booklet. We’ll now go through the calculation of each of these quantities, 𝑆𝑥𝑥, 𝑆𝑦𝑦, and 𝑆𝑥𝑦, step by step, using the summary that we were given at the start of the question.
For 𝑆𝑥𝑥, first of all then, we’re told that the sum of 𝑥 squared is 76. The sum of 𝑥 is six. And 𝑛 is also six. So we have 76 minus six squared over six. There is an important distinction to be made between the sum of 𝑥 squared and the sum of 𝑥 all squared. For the sum of 𝑥 squared, we square the individual 𝑥-values first and then add them up. For the sum of 𝑥 all squared, we sum the 𝑥-values first and then square the result. Make sure that you’re clear on the difference between these two pieces of notation. Now, six squared is 36. And 36 divided by six is six. So our calculation of 𝑆𝑥𝑥 simplifies to 76 minus six, which is equal to 70.
For 𝑆𝑦𝑦 next, we have that the sum of 𝑦 squared is 91. And the sum of 𝑦 is 21. So our calculation is 91 minus 21 squared over six. This simplifies to 91 minus 73.5, which is equal to 17.5. Finally, for 𝑆𝑥𝑦, we have that the sum of 𝑥𝑦, which means the product of all of the individual 𝑥-values multiplied by their corresponding 𝑦-values and then summed, is 56. We have that the sum of 𝑥 is six. And the sum of 𝑦 is 21 and again is six. So we have 56 minus six multiplied by 21 over six. The sixes in the numerator and denominator of the fraction cancel. So we’re left with 56 minus 21, which is 35.
Now, we just need to substitute the values that we’ve calculated for 𝑆𝑥𝑥, 𝑆𝑦𝑦, and 𝑆𝑥𝑦 into the formula for our correlation coefficient. So we have 35 over the square root of 70 multiplied by 17.5. 70 multiplied by 17.5 is 1225. And the square root of 1225 is in fact 35. So we have 35 over 35, which is equal to one. A correlation coefficient of one means that the values 𝑥 and 𝑦 are in perfect positive linear correlation. If we were to plot a graph of 𝑥 against 𝑦, then the points would lie exactly on a straight line with positive gradient.
Part b), find the equation of the regression line of 𝑦 on 𝑥.
The equation of the regression line of 𝑦 on 𝑥 is the equation of the straight line 𝑦 equals 𝑎 plus 𝑏𝑥, which best fits the data values. Usually, the values of 𝑎 and 𝑏 would be chosen to minimise the sum of squares of the residuals, which means the differences between the actual 𝑦-values and their values if we were to predict them using the straight line. However, as the correlation coefficient between our values of 𝑥 and 𝑦 is one, this means that the points all lie exactly on a straight line. So we’re just looking to find the equation of this line.
In either case, there are standard formulae that we can apply to calculate the values of 𝑎 and 𝑏. 𝑏 is equal to 𝑆𝑥𝑦 over 𝑆𝑥𝑥, where both these quantities are defined as they were in part a). 𝑎 is equal to 𝑦 bar, that’s the mean of the 𝑦-values, minus 𝑏𝑥 bar, the mean of the 𝑥-values. So because we use 𝑏 in the calculation of 𝑎, we have to calculate 𝑏 first. Looking back at our working for part a), we can see that 𝑆𝑥𝑦 was equal to 35. And 𝑆𝑥𝑥 was equal to 70. So 𝑏 is equal to 35 over 70. Now, 35 is actually half of 70. If we divide 35 by 35, we get one. And if we divide 70 by 35, we get two. So our value of 𝑏 just simplifies to one-half.
To calculate 𝑎, we first need to calculate 𝑥 bar and 𝑦 bar. Remember, 𝑥 bar is the mean of the 𝑥-values. So we need to divide their sum by how many there are. The sum of the 𝑥-values is six. And there are six of them. So 𝑥 bar is equal to one. For 𝑦 bar, the sum of the 𝑦-values is 21. And again, there are six of them. So 𝑦 bar is equal to 21 over six, which is 3.5. So substituting the values of 𝑦 bar, 𝑥 bar, and 𝑏 into our formula for 𝑎, we have that 𝑎 is equal to 3.5 minus one-half multiplied by one. Now, a half multiplied by one is just a half, which as a decimal is 0.5. So we have 3.5 minus 0.5, which is three.
Finally, we substitute the values of 𝑎 and 𝑏 that we’ve just calculated into the equation of our line. And we have that the equation of the regression line of 𝑦 on 𝑥 is 𝑦 equals three plus a half 𝑥. We could use this regression line to make predictions about the value of 𝑦 for given values of 𝑥.