# Video: Calculating and Interpreting the Equation of a Line of Best Fit from a Scatterplot

We show you how to draw a line of best fit by eye on a scatterplot and work out the equation of that line. Then, you will see how to interpret the values of the slope and the 𝑦-intercept from the equation in terms of the context of the data.

16:14

### Video Transcript

In this video, we’re gonna look at some data presented in a scatter plot, and we’re gonna draw a line of best fit. Then we’re gonna work out an equation of that line and use it to interpret the meaning of the rate of change or slope and the value of the 𝑦-intercept.

The fares charged by some cabs for journeys of different lengths are shown in the table of values below. And then I’ve got a table of values showing the distance in miles and the fare in dollars of these various different journeys within the city. Now this isn’t actually a question; it’s just a statement. But what we’re gonna do is create a scatter plot, and then we’re going to try and find a line of best fit, and then we’re gonna try and interpret the meaning of that line of best fit and just sort of talk around the issue and interpret the data.

So first let’s think about our 𝑥- and 𝑦-variables. Now generally speaking we would’ve thought that the further you went in a taxi, the more they’re going to charge. So we think the distance would be the 𝑥, the independent variable, and the fare would be 𝑦, the dependent variable. So that’s what we’re gonna use. Now with the data we’ve got, the 𝑥-variable goes up to about twenty and the 𝑦-variable goes up to fifty-five. So we’ve just drawn some axes there and labelled them up: the 𝑦 is the fare and the 𝑥-axis is the distance. So now we’re just gonna take all of those plots individually and plot them on the on the axes.

So we’re just taking, for each ordered pair, we’re taking the 𝑥-value and the 𝑦-value and using those as our 𝑥- and 𝑦-coordinates and just kind of plotting them all. Now the distribution of those points on the on the graph there strongly suggests a straight line. So what we’re gonna do is try to create a line that goes through as many of those points or as close to as many of those points as possible with a pretty even distribution of points above and below that line. So I reckon that would look roughly like that. So we can use this line then for making predictions about fares. So for example, if we had a journey about five miles to do, if we draw a line going from five miles up to our line of best fit and then we map across to the 𝑦-axis, we can see that that’s gonna cost us about eighteen dollars. And in fact likewise, if we had forty dollars to spend, how far can we get on forty dollars? So if we go across from forty dollars to the line of best fit and then map that down to the 𝑥-axis, that’s gonna get us about fourteen miles. So our graph here, the line of best fit on that graph has a meaning. It’s enabling us to make predictions about the cab fare given how far we’ve travelled or about how far we can travel given a certain fare.

Now the fact that these data points are not all exactly on that line of best fit, they’re all slightly different, is telling us that this-this line of best fit isn’t giving us an exact value; it’s only giving us an approximation. So the line of best fit is an approximate rule describing the relationship between the number of miles that we’re travelling and the actual cab fare. Now the fact that the points are generally quite close to that line tells us that it seems to represent quite a good approximation to the rule and that the numbers that we’re gonna get out the predictions we’re gonna get are gonna be a pretty good approximation. But remember, it’s only based on the data that we’ve got for journeys between three and twenty miles, and it’s only based on eight actual cab rides as well, so it wouldn’t be sensible to necessarily expect the same rule to apply for journeys of fifty or a hundred or even a thousand miles. So within the constraints that we just talked about. The cab fare seems to approximately follow a straight line relationship as we’ve shown in this graph. Now the other thing is that when we use the graph to make the predictions, it’s quite difficult to read the exact values. So what we’re gonna do is calculate the equation of that line, and then we can put numbers into our equation and generate numbers as results.

Now to work out the equation of that line, remember we need two things: we need the slope of the line and we need the 𝑦-intercept. Now the 𝑦-intercept’s relatively easil- easy to see. That looks like that’s gonna be about five. And the slope, remember, is every time I increase the 𝑥-coordinate by one, by how much does the 𝑦-coordinate increase. Now if I was actually just gonna do this here, sort of say what’s the distance between here and here kinda thing, that’s actually quite difficult on these- on these particular axes with the scales that we’ve got to do. So what I’m gonna do is I’m gonna look for points which are on exact coordinate axes and which are as far apart as possible, so I’m gonna take this one and this one over here. And I’m gonna use the definition of slope which is the difference in 𝑦-coordinates divided by the difference in 𝑥-coordinates. So between these two points, the 𝑦-coordinate is going up from ten here to fifty here, so the difference is gonna be fifty minus ten. And for the 𝑥-coordinates, we’re going up from two here to eighteen here, so the difference is eighteen minus two. And that becomes forty over sixteen, which simplifies to five over two. So this means we’ve got a slope of five over two, and we’ve got a 𝑦-intercept of five. Since the relationship is a linear relationship, we’ve got a straight line graph, we’re gonna use that general form of the equation 𝑦 equals 𝑚 𝑥 plus 𝑏, and the slope is five over two, and the intercept is five, so we can plug those numbers in. Now the 𝑦 is the fare in dollars; so actually saying that the multiplier is five over two well it- it’s perfectly accurate. When we’re talking about money, it’s probably better to say it’s two dollars fifty cents or two point five dollars, so our equation becomes 𝑦 equals two point five 𝑥 plus five. And the way that we interpret those numbers is the intercept being five; that’s the 𝑦-coordinate when 𝑥 equals zero. This means that just to get in the cab, these taxi drivers are charging you five dollars, so that’s kind of a start fee for your journey. And then the slope tells you how much they’re charging every time they increase 𝑥 by one, so 𝑥 is the number of miles that you travel in the taxi. So basically for every mile that you go, they’re char- charging you two dollars and fifty cents. So our interpretation is that each fare consists of a fixed fee of five dollars plus two dollars fifty per mile travel. Now this is only an approximation as we said; none of the fares exactly match that cause none of the points are exactly one the line, but they’re all pretty close to that. That’s the general rule they sort of follow quite closely.

And now we’ve got the equation; we can use that probably more easily than we can use the graph for making predictions about how much each journey would cost. So on the graph, if we were going eight miles, we’d have to go up to the graph here and then sort of come across and so the roughly guess is that gonna be twenty-five, twenty-four, twenty-six dollars. But if we put the number straight into the equation, we can see that the cost is gonna be two point five times eight plus five. So that’s twenty plus five, which is twenty-five dollars. So it’s easier to get sort of more accurate answers by using the equation. Now to use the equation to make predictions in the other direction, so say we got thirty-five dollars and we wanna know roughly how far we we’ll get with our thirty-five dollars, we’re gonna have to rearrange that equation so we’re gonna have to make 𝑥 the subject. So what I’m gonna do here is take away five from both sides of that equation, which gives us 𝑦 minus five on the left-hand side and two point five 𝑥 on the right-hand side cause five take away five is nothing. And now if I divide both sides by two point five, I’ll know what 𝑥 is equal to. So the distance that I can travel for a given fare is 𝑦 minus five over two point five. So let’s say we did have thirty-five dollars, we can put thirty-five in for 𝑦. So 𝑥, the number of miles that we can go will be thirty-five take away five; that’s thirty over two point five, which is twelve miles.

So on the graph it would look like this, but I think it’s kind of easier to get a more accurate answer when you’re actually working with equations and numbers. So we follow the process right through. We started off with a table of values over here. From that, we plotted the graph. From the graph, we calculated this equation, and we’ve seen how we can use that equation to make predictions of fares based on how far we’re travelling or how far we can get with a given amount of money. We’ve also interpreted that equation so that we know that five tells us what the fixed fee is for every journey, and the slope two point five tells us that we’re charging two dollars fifty or roughly per mile that we travel. Now the only thing that we haven’t considered all in all of this is for that particular function, that eq- that equation representing the function for the relationship between the distance and the fare, what would be a suitable domain? Now given that we’re not gonna charge negative amounts if we start driving backwards to different places, it probably makes sense that the distance we’re travelling is always gonna be positive. So in terms of the maths, it makes sense to put a restriction on the numbers that we can put into this equation saying that the 𝑥-values, the number of miles, has to be at least zero for this to make any kind of sense.

So having done all of that let’s do one more example and we’re going to do that a little bit more quickly.

So nine students were asked to measure the diameter and circumference of nine different circles that was one each, and the results are shown in the table below. So we’ve got for each student, they’ve measured so they haven’t done any calculations here, they’ve just you know taken a ruler or a piece of string and measured these distances. So for example, the first student had a diameter of two inches on their circle and a circumference of six inches. So what we’re gonna do is we’re gonna plot that on a scatter graph. And then we’re gonna do a line of best fit, work out the equation of that line of best fit, and then try and interpret some of the-the parameters. So first things first, we need to define which our 𝑥- and 𝑦-coordinates are. So I’m gonna say 𝑥, so you set the diameter of the circle and then that determines what the circumference is, so I’m gonna use 𝑥 for the diameter and 𝑦 for the circumference, and then I’m gonna treat each of these as an ordered pair and use the 𝑥-coordinate and the 𝑦-coordinate and plot those points.

And that’s what we get. Now most of the points pretty strongly suggest a straight line relationship between 𝑥 and 𝑦, between the diameter and the circumference, but there is one point that looks very different to the others. Now what’s going on here? So a number of different possibilities come to mind. I mean it could be that the ruler that that student was using was extraordinarily sensitive to changes in temperature so it expands and contracts as it heats up or cools down, and maybe they were doing their measuring in an environment where the temperature was rapidly changing. It could be that there was a bizarre gravitational event nearby which massively warped space-time while the student was making their measurements. It could be that they found a bizarre circle which looks very different and has different properties to all the other circles or maybe it could be that the student was just very bad at measuring or possibly they just write down the numbers the wrong way round. Well we don’t know which one of those is the real situation, and we can’t really make any assumptions. It looks very likely that they’ve just transposed the diameter and the circumference. But because this is secondary data, we don’t have access to the original circles, we don’t have access to the original students, I think what we’re gonna do is just assume that it looks very different. It’s probably wrong; we’re going to ignore that piece of data for now. And you do have to be very careful about throwing away bits of data that you just don’t like the look of because you can obviously skew your results. But from what we know about circles and the way that they work and about geometry, I think it’s pretty clear that that-that-that does look like a dodgy piece of data, so I think in this particular case we’re safe to ignore it. So let’s draw a line of best fit through the rest of the points.

Now that looks like a reasonable line of best fit for the rest of those points. So to work out the equation of that line, we’re gonna have to find the intercept and the slope. Well that line seems to go through the origin. So the 𝑦-intercept, when 𝑥 is zero, the 𝑦-value is zero. And to work out the slope, I’m gonna pick two points which are on my grid lines, and I’m gonna work out the difference in 𝑥 and the difference in 𝑦 again. So in going from this point to this point, the 𝑦-coordinate’s gone up from zero up to thirty, so the difference in 𝑦-coordinated is thirty. And between those same two points, the 𝑥-coordinate is going up from zero up to nine point five. So the slope looks like it’s thirty divided by nine point five, which is about three point one six. And because we’ve got a linear relationship, our equation is gonna look something like 𝑦 equals 𝑚 𝑥 plus 𝑏, and we’ve calculated the slope is three point one six and the intercept is zero. So there’s our equation 𝑦 equals three point one six 𝑥 plus zero. Then again, we don’t usually bother writing plus zero on the end of our equations, so we’re just gonna go with 𝑦 equals three point one six 𝑥.

So just thinking back that 𝑥 represents the diameter in inches and 𝑦 represents the circumference of inches, then for those circles we’re saying wit-with this data that we have, we reckon that the circumference is roughly equal to three point one six times the diameter. And just sort of interpreting those parameters, the intercept here at zero, that makes sense; so if we’ve got a circle that’s got a diameter of zero, we haven’t really got a circle so the circumference will be zero as well. So we’re, we’re sort of happy with the interpretation of that, and it means that every time we add an inch to the diameter of our circle, we’re gonna multiply that by three point one six to get the circumference. So every extra inch on the diameter adds three point one six inches to the circumference of the circle. Now even from here, I can hear those of you who are paying attention in your geometry classes screaming at me that, “But we know the circumference of a circle is 𝜋 times the diameter!” So what we’ve done in our little experiment here with these nine students is we’ve calculated using statistical techniques an approximation for the value of 𝜋. These two things are completely compatible with each other except one of them is a bit more inaccurate than the other one. Because our students are doing measuring, they’re not always doing that a hundred percent accurately, so some of these points are not quite on the line, although in theory they should all be exactly on a straight line. But with all these errors, when we add all these errors up, our estimation of the value of 𝜋 has come out slightly wrong; it’s three point one six instead of three point one four one five nine blah blah blah blah blah. But nonetheless, it’s not a bad estimate. So hopefully, the couple of examples we’ve just looked have given you a chance to see the value of scatter plots and how useful they can be in interpreting data. But more importantly perhaps, they’ve enabled you to calculate the equation of a straight line and to interpret some of the values. So the intercept, the 𝑦-intercept, and the slope of that line, we’ve interpreted that in-in some sort of a real-life context.