Video Transcript
For a given data set, the sum of π₯
equals 47, the sum of π¦ equals 45.75, the sum of π₯ squared equals 329, the sum of
π¦ squared equals 389.3125, the sum of π₯π¦ equals 310.25, and π equals eight. Calculate the value of the
regression coefficient π in the least squares regression model π¦ equals π plus
ππ₯. Give your answer correct to three
decimal places.
Letβs begin by reminding ourselves
about this least squares regression model π¦ equals π plus ππ₯. This gives the equation of the
straight line which best fits a scatter plot of an π₯π¦ data set. The values of π and π are chosen
to minimize the sum of the squares of the residuals. Those are the vertical differences
between the π¦-value of each point and the π¦-value we would get if weβre using the
model for prediction.
There are standard formulae that we
can apply for calculating the values of π and π. π first of all is equal to ππ₯π¦
over ππ₯π₯, where ππ₯π¦ and ππ₯π₯ are as given below. And π, although weβre not asked
for it here, is equal to π¦ bar β thatβs the mean of the π¦-values β minus π
multiplied by π₯ bar β the mean of the π₯-values. We can see that if we compare our
least squares regression model with the general equation of a straight line, then
the value π represents the slope of this line and the value π represents its
π¦-intercept if itβs appropriate to extend the line that far.
Now, we havenβt been given the raw
data, but we have been given the summary statistics for this data set. So, thatβs enough for us to
calculate the values of ππ₯π¦ and ππ₯π₯ and therefore calculate the value of
π. For ππ₯π¦ first of all then, we
use the sum of π₯π¦, which is 310.25. We use the sum of π₯ which is 47,
the sum of π¦, which is 45.75, and the value of π, the number of pairs of data,
which is eight. We have that ππ₯π¦ is equal to
310.25 minus 47 multiplied by 45.75 over eight. That gives 41.46875 exactly.
Now, before we calculate ππ₯π₯, we
just need to be clear on the distinction between the two pieces of notation
here. The sum of π₯ squared means that we
square each of the individual π₯-values and then we find their sum. Whereas the sum of π₯ all squared
means we find the sum of the π₯-values first and then square this sum. That is particularly important if
we were calculating these summaries ourselves from the raw data. So to calculate this, we need the
sum of π₯ squared, which is 329. We then take the sum of π₯ which is
47, square it, and divide it by π, which is equal to eight. Evaluating this on a calculator
gives 52.875 exactly.
To find the value of the regression
coefficient π then, we take our value of ππ₯π¦ and we divide it by our value for
ππ₯π₯. That gives a decimal of 0.78427
continuing. And when we were asked to give our
answer correct to three decimal places, so rounding this value, we have 0.784. Now, just in terms of the
interpretation of this value, remember, π gives the slope of the least squares
regression line. So, a value of 0.784 means that the
line has a positive slope. And for every increase of one unit
in the π₯ variable, the model predicts an increase of 0.784 units in the π¦
variable. We werenβt asked to find the value
of π in this question. But if we did need to calculate it,
we could use the value of π weβve just found together with the values of π₯ bar and
π¦ bar, which can be found by dividing the sum of π₯ and the sum of π¦ by π.
Weβve calculated the regression
coefficient π in the least squares regression model π¦ equals π plus ππ₯ to be
0.784 correct to three decimal places.