Video Transcript
Using the information in the table,
find the regression line 𝑦 hat is equal to 𝑎 plus 𝑏𝑥. Round 𝑎 and 𝑏 to three decimal
places.
Since we want to find the
regression line, we begin by determining which of our variables is the dependent and
which is the independent variable. We might expect that the amount of
summer crop produced in kilograms is dependent on the amount of land it’s produced
on. And so we specify the production in
kilograms is the dependent variable 𝑦, whereas cultivated land measured in feddan
is the independent variable 𝑥. And note that a feddan is a unit of
area measuring just over one acre.
To find the regression line, we
must find the slope 𝑏 and the 𝑦-intercept 𝑎. And to find these values, we use
the two formulae shown. We first calculate the slope 𝑏
since we’ll need this to calculate the 𝑦-intercept 𝑎. And we see from our formula for 𝑏
that we’re going to need to find various sums, that is, the sum of the products
𝑥𝑦, the sum of the 𝑥-values, the sum of the 𝑦-values, the sum of the squared
𝑥-values, and we’ll also need the sum of the 𝑥’s all squared. And to find the value for 𝑎, we’re
going to need the mean of the 𝑦-values, that is, the sum of the 𝑦-values divided
by 𝑛, which is the number of data pairs, and similarly for the mean of the
𝑥-values.
In our data set, we have 10 pairs
of data so that 𝑛 is equal to 10. And we make a note of this before
we start making our calculations. Our next step is to find the
sums. And to find the sum of our products
𝑥𝑦 and our 𝑥 squared values, we introduce two new rows to our table. To calculate the products 𝑥𝑦,
taking our first 𝑥 and our first 𝑦, we have 126 multiplied by 160. That is 20160. And this goes into the first cell
of our first new row. Our second product is our second
𝑥-value multiplied by our second 𝑦-value. That is 13 multiplied by 40, which
is 520. And this goes into our second cell
in the first new row. We can then complete this row with
the products as shown.
The first element in our second new
row is the first 𝑥-value squared, that is, 126 squared, which is 15876. And this goes into our second new
row. Our second 𝑥-value squared is 13
squared, which is 169. And this goes into the second cell
of our second new row. And we continue in this way to
complete the row. Our next step is to find the sum
for each of the rows. So we introduce a new column. The sum of the 𝑥-values is
967. The sum of the 𝑦-values is
1880. The sum of the products 𝑥𝑦 is
189320. And the sum of the squares of the
𝑥’s is 130977. So now with all our sums, we’re in
a position to calculate 𝑏.
Substituting our sums into the
formula for 𝑏 with 𝑛 is equal to 10, we have 10 times 189320, that’s the sum of
the products 𝑥𝑦, minus 967, which is the sum of the 𝑥’s, multiplied by 1880,
which is the sum of the 𝑦’s, all divided by 10, which is 𝑛, multiplied by the sum
of the squared 𝑥-values, which is 130977, minus 967 squared. That’s the sum of the 𝑥’s all
squared. And evaluating our numerator and
denominator, we have 75240 divided by 374681. And this evaluates to approximately
0.20081. To three decimal places then, we
have 𝑏 is equal to 0.201.
Now to find the 𝑦-intercept 𝑎, we
need to find the means of the 𝑦’s and the 𝑥-values. The mean of the 𝑦’s is the sum of
all the 𝑦-values divided by 𝑛. That’s 1880 divided by 10, and
that’s 188. Similarly, the mean of the
𝑥-values is the sum of the 𝑥’s divided by 𝑛. And that’s 967 divided by 10, which
is 96.7. So now we can use these values
together with our slope 𝑏, where we’ll use the value of 𝑏 to five decimal places
for accuracy, to calculate the 𝑦-intercept 𝑎. Evaluating this gives us 𝑎 is
equal to 168.58167 and so on. That is 168.582 to three decimal
places. The line of least squares
regression then for this data to three decimal places is 𝑦 hat is equal to 168.582
plus 0.201𝑥.
We can interpret this as for every
additional unit of land, we expect the production of the summer crop to increase by
approximately 0.2 kilograms.