Video Transcript
For a given data set, the sum of 𝑥
equals 47, the sum of 𝑦 equals 45.75, the sum of 𝑥 squared equals 329, the sum of
𝑦 squared equals 389.3125, the sum of 𝑥𝑦 equals 310.25, and 𝑛 equals eight. Calculate the value of the
regression coefficient 𝑏 in the least squares regression model 𝑦 equals 𝑎 plus
𝑏𝑥. Give your answer correct to three
decimal places.
Let’s begin by reminding ourselves
about this least squares regression model 𝑦 equals 𝑎 plus 𝑏𝑥. This gives the equation of the
straight line which best fits a scatter plot of an 𝑥𝑦 data set. The values of 𝑎 and 𝑏 are chosen
to minimize the sum of the squares of the residuals. Those are the vertical differences
between the 𝑦-value of each point and the 𝑦-value we would get if we’re using the
model for prediction.
There are standard formulae that we
can apply for calculating the values of 𝑎 and 𝑏. 𝑏 first of all is equal to 𝑆𝑥𝑦
over 𝑆𝑥𝑥, where 𝑆𝑥𝑦 and 𝑆𝑥𝑥 are as given below. And 𝑎, although we’re not asked
for it here, is equal to 𝑦 bar — that’s the mean of the 𝑦-values — minus 𝑏
multiplied by 𝑥 bar — the mean of the 𝑥-values. We can see that if we compare our
least squares regression model with the general equation of a straight line, then
the value 𝑏 represents the slope of this line and the value 𝑎 represents its
𝑦-intercept if it’s appropriate to extend the line that far.
Now, we haven’t been given the raw
data, but we have been given the summary statistics for this data set. So, that’s enough for us to
calculate the values of 𝑆𝑥𝑦 and 𝑆𝑥𝑥 and therefore calculate the value of
𝑏. For 𝑆𝑥𝑦 first of all then, we
use the sum of 𝑥𝑦, which is 310.25. We use the sum of 𝑥 which is 47,
the sum of 𝑦, which is 45.75, and the value of 𝑛, the number of pairs of data,
which is eight. We have that 𝑆𝑥𝑦 is equal to
310.25 minus 47 multiplied by 45.75 over eight. That gives 41.46875 exactly.
Now, before we calculate 𝑆𝑥𝑥, we
just need to be clear on the distinction between the two pieces of notation
here. The sum of 𝑥 squared means that we
square each of the individual 𝑥-values and then we find their sum. Whereas the sum of 𝑥 all squared
means we find the sum of the 𝑥-values first and then square this sum. That is particularly important if
we were calculating these summaries ourselves from the raw data. So to calculate this, we need the
sum of 𝑥 squared, which is 329. We then take the sum of 𝑥 which is
47, square it, and divide it by 𝑛, which is equal to eight. Evaluating this on a calculator
gives 52.875 exactly.
To find the value of the regression
coefficient 𝑏 then, we take our value of 𝑆𝑥𝑦 and we divide it by our value for
𝑆𝑥𝑥. That gives a decimal of 0.78427
continuing. And when we were asked to give our
answer correct to three decimal places, so rounding this value, we have 0.784. Now, just in terms of the
interpretation of this value, remember, 𝑏 gives the slope of the least squares
regression line. So, a value of 0.784 means that the
line has a positive slope. And for every increase of one unit
in the 𝑥 variable, the model predicts an increase of 0.784 units in the 𝑦
variable. We weren’t asked to find the value
of 𝑎 in this question. But if we did need to calculate it,
we could use the value of 𝑏 we’ve just found together with the values of 𝑥 bar and
𝑦 bar, which can be found by dividing the sum of 𝑥 and the sum of 𝑦 by 𝑛.
We’ve calculated the regression
coefficient 𝑏 in the least squares regression model 𝑦 equals 𝑎 plus 𝑏𝑥 to be
0.784 correct to three decimal places.