If the sum of 𝑥 equals 21, the sum of 𝑦 equals negative three, the sum of 𝑥 squared equals 91, the sum of 𝑦 squared equals 19, the sum of 𝑥𝑦 equals negative 28, and 𝑛 equals six, find the regression line equation of 𝑦 on 𝑥.
The regression line equation of 𝑦 on 𝑥 is of the form 𝑦 equals 𝑎 plus 𝑏𝑥, where 𝑎 and 𝑏 are constants. And it describes the equation of the straight line which most closely fits the given data. The values of 𝑎 and 𝑏 are chosen so that the sum of the squares of the residuals, which are the differences between each actual value and the value as predicted by the line, is made as small as possible. It is minimised. There are standard formulae that we can apply in order to calculate the values of 𝑎 and 𝑏. 𝑏 is equal to 𝑠𝑥𝑦 over 𝑠𝑥𝑥, where 𝑠𝑥𝑦 and 𝑠𝑥𝑥 are defined as follows. 𝑠𝑥𝑦 is equal to the sum of 𝑥𝑦 minus the sum of 𝑥 multiplied by the sum of 𝑦 over 𝑛. And 𝑠𝑥𝑥 is equal to the sum of 𝑥 squared minus the sum of 𝑥 all squared over 𝑛.
Notice that there is a key difference between the two pieces of notation used in the definition of 𝑠𝑥𝑥. The sum of 𝑥 squared means that we square each individual 𝑥-value first and then find the sum. Whereas the sum of 𝑥 all squared means that we find the sum of the 𝑥-values first and then square the result. We must be clear on the distinction between these two things. Once we found the value of 𝑏, we can work out the value of 𝑎 by recalling that one point which will always lie on this straight line is the point 𝑥 bar, 𝑦 bar. That’s the point whose 𝑥-coordinate is the mean of the 𝑥-values and whose 𝑦-coordinate is the mean of the 𝑦-values.
If this point lies on the line, then its coordinates satisfy the equation of the line. So we have 𝑦 bar is equal to 𝑎 plus 𝑏𝑥 bar. By subtracting 𝑏𝑥 bar from each side, and here I’ve swapped the two sides of the equation around, we have an equation that we can use to calculate 𝑎. 𝑎 is equal to 𝑦 bar minus 𝑏𝑥 bar. We’ve been given all of the summaries that we need to work out each of these quantities in the question. So let’s begin.
For 𝑠𝑥𝑦 first of all, the sum of 𝑥𝑦 is equal to negative 28. The sum of 𝑥 is equal to 21, and the sum of 𝑦 is equal to negative three. 𝑛 is equal to six. So we have negative 28 minus 21 times negative three over six. This simplifies to negative 28 plus 63 over six, which is equal to negative 17.5. In our calculation of 𝑠𝑥𝑥 next, we have the sum of 𝑥 squared which is equal to 91. We then subtract the sum of 𝑥 all squared, so that’s 21 squared, over 𝑛, which is equal to six, giving 91 minus 21 squared over six. That simplifies to 91 minus 441 over six, which is equal to 17.5.
So now we have the values of 𝑠𝑥𝑦 and 𝑠𝑥𝑥. We can substitute into the formula for calculating 𝑏. This gives negative 17.5 over 17.5, which simplifies to negative one.
Next, we need to calculate the value of 𝑎. And to do this, we first need to work out the values of 𝑥 bar and 𝑦 bar. To do so, we recall that to find the mean of a set of data we add them all up and divide by how many pieces of data there are. So 𝑥 bar is equal to the sum of 𝑥 over 𝑛. And 𝑦 bar is equal to the sum of 𝑦 over 𝑛. For 𝑥 bar, we have 21 over six which is equal to 3.5. And for 𝑦 bar, negative three over six which is equal to negative 0.5. We can now substitute the values of 𝑦 bar, 𝑏, and 𝑥 bar into the formula for calculating 𝑎. And it gives 𝑎 equals negative 0.5 minus negative one multiplied by 3.5. That’s negative 0.5 plus 3.5 which is equal to three.
Finally, we substitute the values that we found for 𝑎 and 𝑏 into the equation of our straight line. 𝑎 is equal to three, and 𝑏 is equal to negative one. So our equation is 𝑦 equals three minus 𝑥. We can, of course, write the equation the other way round if we wish. 𝑦 equals negative 𝑥 plus three. So we found the equation of the regression line of 𝑦 on 𝑥. It’s equal to 𝑦 equals three minus 𝑥. As this line has a gradient of negative one, it means that there is negative correlation between 𝑥 and 𝑦.