If the sum of 𝑥 is equal to 15, the sum of 𝑦 is equal to negative six, the sum of 𝑥 squared is equal to 55, the sum of 𝑦 squared is equal to 76, the sum of 𝑥𝑦 is equal to negative 50, and 𝑛 equals six, find the regression line equation of 𝑦 on 𝑥.
The regression line equation of 𝑦 on 𝑥 is a straight line of the form 𝑦 equals 𝑎 plus 𝑏𝑥. This gives the equation of the straight line which fits a given data set most closely. The values of 𝑎 and 𝑏 are constants and are chosen to minimise the sum of the squares of the residuals. Those are the differences between the actual values and the values as predicted by the regression line. There are standard formulae that we can apply to calculate the values of 𝑎 and 𝑏.
Firstly, 𝑏 is equal to 𝑠𝑥𝑦 over 𝑠𝑥𝑥, where 𝑠𝑥𝑦 and 𝑠𝑥𝑥 are as defined on the screen. 𝑠𝑥𝑦 is equal to the sum of 𝑥𝑦 minus the sum of 𝑥 multiplied by the sum of 𝑦 over 𝑛. And 𝑠𝑥𝑥 is equal to the sum of 𝑥 squared minus the sum of 𝑥 all squared over 𝑛.
We need to be clear on the distinction between two pieces of notation, the sum of 𝑥 squared and the sum of 𝑥 all squared. In the first case, the sum of 𝑥 squared means we need to square each individual 𝑥-value and then find the sum, whereas the sum of 𝑥 all squared means we need to sum the 𝑥-values first and then square the result. This distinction is particularly important if we’re having to calculate these summaries ourselves from a row data set.
Once we’ve found the value of 𝑏, we can work out the value of 𝑎 by recalling that one point, which will always lie on this line, is the point with coordinates 𝑥 bar, 𝑦 bar. That’s the point whose 𝑥-coordinate is the mean of the 𝑥-values and whose 𝑦-coordinate is the mean of the 𝑦-values. If this point lies on the line, then its coordinates will satisfy the equation of the line. So, we have 𝑦 bar is equal to 𝑎 plus 𝑏𝑥 bar. By subtracting the 𝑏𝑥 bar from each side, we see that 𝑎 is equal to 𝑦 bar minus 𝑏𝑥 bar. So, if we know 𝑏 and 𝑥 bar and 𝑦 bar, we can calculate 𝑎.
Let’s begin our calculation of 𝑏 then. From the summary that’s given at the start of the question, the sum of 𝑥𝑦 is equal to negative 50. The sum of 𝑥 is equal to 15. And the sum of 𝑦 is equal to negative six. 𝑛 is equal to six. So, we have negative 50 minus 15 multiplied by negative six over six. We can cancel a factor of six in the numerator and denominator, giving 𝑠𝑥𝑦 equals negative 50 minus 15 multiplied by negative one. That simplifies to negative 50 plus 15, which is equal to negative 35.
In our calculation of 𝑠𝑥𝑥, next then, the sum of 𝑥 squared is equal to 55. The sum of 𝑥 is equal to 15. And 𝑛 is equal to six. So, we have 55 minus 15 squared over six. That works out to be 17.5. Substituting these values into our formula for 𝑏 then, we have that 𝑏 is equal to negative 35 over 17.5. Now recall that 17.5 is half of 35. So, 35 divided by 17.5 is two. This means that negative 35 divided by 17.5 is negative two. So, our value of 𝑏 is negative two.
To calculate 𝑎, we need to first work out 𝑥 bar and 𝑦 bar. And we will recall that to find the mean of a set of values, we find the sum of those values and then divide it by how many values there are. So, 𝑥 bar is equal to the sum of 𝑥 over 𝑛 and 𝑦 bar is equal to the sum of 𝑦 over 𝑛.
For 𝑥 bar, we have 15 over six, which is equal to 2.5, and for 𝑦 bar, negative six over six, which is equal to negative one. Substituting the values of 𝑦 bar, 𝑏, and 𝑥 bar into our formula for 𝑎, we have that 𝑎 is equal to negative one minus negative two multiplied by 2.5. That’s negative one minus negative five, which is the same as negative one plus five, which is equal to four.
We’ve found the values of 𝑎 and 𝑏 then. 𝑎 is equal to four and 𝑏 is equal to negative two. All that remains is just to substitute these values into the equation of the regression line, which gives 𝑦 is equal to four minus two 𝑥. So, using the given summaries, we found the regression line equation of 𝑦 on 𝑥. 𝑦 is equal to four minus two 𝑥.