Video Transcript
In this video, we’re gonna look at
some data presented in a scatter plot, and we’re gonna draw a line of best fit. Then we’re gonna work out an
equation of that line and use it to interpret the meaning of the rate of change or
slope and the value of the 𝑦-intercept.
The fares charged by some cabs
for journeys of different lengths are shown in the table of values below. And then I’ve got a table of
values showing the distance in miles and the fare in dollars of these various
different journeys within the city. Now this isn’t actually a
question; it’s just a statement. But what we’re gonna do is
create a scatter plot, and then we’re going to try and find a line of best fit,
and then we’re gonna try and interpret the meaning of that line of best fit and
just sort of talk around the issue and interpret the data.
So first let’s think about our
𝑥- and 𝑦-variables. Now generally speaking we
would’ve thought that the further you went in a taxi, the more they’re going to
charge. So we think the distance would
be the 𝑥, the independent variable, and the fare would be 𝑦, the dependent
variable. So that’s what we’re gonna
use. Now with the data we’ve got,
the 𝑥-variable goes up to about twenty and the 𝑦-variable goes up to
fifty-five. So we’ve just drawn some axes
there and labelled them up: the 𝑦 is the fare and the 𝑥-axis is the
distance. So now we’re just gonna take
all of those plots individually and plot them on the on the axes.
So we’re just taking, for each
ordered pair, we’re taking the 𝑥-value and the 𝑦-value and using those as our
𝑥- and 𝑦-coordinates and just kind of plotting them all. Now the distribution of those
points on the on the graph there strongly suggests a straight line. So what we’re gonna do is try
to create a line that goes through as many of those points or as close to as
many of those points as possible with a pretty even distribution of points above
and below that line. So I reckon that would look
roughly like that. So we can use this line then
for making predictions about fares. So for example, if we had a
journey about five miles to do, if we draw a line going from five miles up to
our line of best fit and then we map across to the 𝑦-axis, we can see that
that’s gonna cost us about eighteen dollars. And in fact likewise, if we had
forty dollars to spend, how far can we get on forty dollars? So if we go across from forty
dollars to the line of best fit and then map that down to the 𝑥-axis, that’s
gonna get us about fourteen miles. So our graph here, the line of
best fit on that graph has a meaning. It’s enabling us to make
predictions about the cab fare given how far we’ve travelled or about how far we
can travel given a certain fare.
Now the fact that these data
points are not all exactly on that line of best fit, they’re all slightly
different, is telling us that this-this line of best fit isn’t giving us an
exact value; it’s only giving us an approximation. So the line of best fit is an
approximate rule describing the relationship between the number of miles that
we’re travelling and the actual cab fare. Now the fact that the points
are generally quite close to that line tells us that it seems to represent quite
a good approximation to the rule and that the numbers that we’re gonna get out
the predictions we’re gonna get are gonna be a pretty good approximation. But remember, it’s only based
on the data that we’ve got for journeys between three and twenty miles, and it’s
only based on eight actual cab rides as well, so it wouldn’t be sensible to
necessarily expect the same rule to apply for journeys of fifty or a hundred or
even a thousand miles. So within the constraints that
we just talked about. The cab fare seems to
approximately follow a straight line relationship as we’ve shown in this
graph. Now the other thing is that
when we use the graph to make the predictions, it’s quite difficult to read the
exact values. So what we’re gonna do is
calculate the equation of that line, and then we can put numbers into our
equation and generate numbers as results.
Now to work out the equation of
that line, remember we need two things: we need the slope of the line and we
need the 𝑦-intercept. Now the 𝑦-intercept’s
relatively easil- easy to see. That looks like that’s gonna be
about five. And the slope, remember, is
every time I increase the 𝑥-coordinate by one, by how much does the
𝑦-coordinate increase. Now if I was actually just
gonna do this here, sort of say what’s the distance between here and here kinda
thing, that’s actually quite difficult on these- on these particular axes with
the scales that we’ve got to do. So what I’m gonna do is I’m
gonna look for points which are on exact coordinate axes and which are as far
apart as possible, so I’m gonna take this one and this one over here. And I’m gonna use the
definition of slope which is the difference in 𝑦-coordinates divided by the
difference in 𝑥-coordinates. So between these two points,
the 𝑦-coordinate is going up from ten here to fifty here, so the difference is
gonna be fifty minus ten. And for the 𝑥-coordinates,
we’re going up from two here to eighteen here, so the difference is eighteen
minus two. And that becomes forty over
sixteen, which simplifies to five over two. So this means we’ve got a slope
of five over two, and we’ve got a 𝑦-intercept of five. Since the relationship is a
linear relationship, we’ve got a straight line graph, we’re gonna use that
general form of the equation 𝑦 equals 𝑚 𝑥 plus 𝑏, and the slope is five over
two, and the intercept is five, so we can plug those numbers in. Now the 𝑦 is the fare in
dollars; so actually saying that the multiplier is five over two well it- it’s
perfectly accurate. When we’re talking about money,
it’s probably better to say it’s two dollars fifty cents or two point five
dollars, so our equation becomes 𝑦 equals two point five 𝑥 plus five. And the way that we interpret
those numbers is the intercept being five; that’s the 𝑦-coordinate when 𝑥
equals zero. This means that just to get in
the cab, these taxi drivers are charging you five dollars, so that’s kind of a
start fee for your journey. And then the slope tells you
how much they’re charging every time they increase 𝑥 by one, so 𝑥 is the
number of miles that you travel in the taxi. So basically for every mile
that you go, they’re char- charging you two dollars and fifty cents. So our interpretation is that
each fare consists of a fixed fee of five dollars plus two dollars fifty per
mile travel. Now this is only an
approximation as we said; none of the fares exactly match that cause none of the
points are exactly one the line, but they’re all pretty close to that. That’s the general rule they
sort of follow quite closely.
And now we’ve got the equation;
we can use that probably more easily than we can use the graph for making
predictions about how much each journey would cost. So on the graph, if we were
going eight miles, we’d have to go up to the graph here and then sort of come
across and so the roughly guess is that gonna be twenty-five, twenty-four,
twenty-six dollars. But if we put the number
straight into the equation, we can see that the cost is gonna be two point five
times eight plus five. So that’s twenty plus five,
which is twenty-five dollars. So it’s easier to get sort of
more accurate answers by using the equation. Now to use the equation to make
predictions in the other direction, so say we got thirty-five dollars and we
wanna know roughly how far we we’ll get with our thirty-five dollars, we’re
gonna have to rearrange that equation so we’re gonna have to make 𝑥 the
subject. So what I’m gonna do here is
take away five from both sides of that equation, which gives us 𝑦 minus five on
the left-hand side and two point five 𝑥 on the right-hand side cause five take
away five is nothing. And now if I divide both sides
by two point five, I’ll know what 𝑥 is equal to. So the distance that I can
travel for a given fare is 𝑦 minus five over two point five. So let’s say we did have
thirty-five dollars, we can put thirty-five in for 𝑦. So 𝑥, the number of miles that
we can go will be thirty-five take away five; that’s thirty over two point five,
which is twelve miles.
So on the graph it would look
like this, but I think it’s kind of easier to get a more accurate answer when
you’re actually working with equations and numbers. So we follow the process right
through. We started off with a table of
values over here. From that, we plotted the
graph. From the graph, we calculated
this equation, and we’ve seen how we can use that equation to make predictions
of fares based on how far we’re travelling or how far we can get with a given
amount of money. We’ve also interpreted that
equation so that we know that five tells us what the fixed fee is for every
journey, and the slope two point five tells us that we’re charging two dollars
fifty or roughly per mile that we travel. Now the only thing that we
haven’t considered all in all of this is for that particular function, that eq-
that equation representing the function for the relationship between the
distance and the fare, what would be a suitable domain? Now given that we’re not gonna
charge negative amounts if we start driving backwards to different places, it
probably makes sense that the distance we’re travelling is always gonna be
positive. So in terms of the maths, it
makes sense to put a restriction on the numbers that we can put into this
equation saying that the 𝑥-values, the number of miles, has to be at least zero
for this to make any kind of sense.
So having done all of that let’s do
one more example and we’re going to do that a little bit more quickly.
So nine students were asked to
measure the diameter and circumference of nine different circles that was one
each, and the results are shown in the table below. So we’ve got for each student,
they’ve measured so they haven’t done any calculations here, they’ve just you
know taken a ruler or a piece of string and measured these distances. So for example, the first
student had a diameter of two inches on their circle and a circumference of six
inches. So what we’re gonna do is we’re
gonna plot that on a scatter graph. And then we’re gonna do a line
of best fit, work out the equation of that line of best fit, and then try and
interpret some of the-the parameters. So first things first, we need
to define which our 𝑥- and 𝑦-coordinates are. So I’m gonna say 𝑥, so you set
the diameter of the circle and then that determines what the circumference is,
so I’m gonna use 𝑥 for the diameter and 𝑦 for the circumference, and then I’m
gonna treat each of these as an ordered pair and use the 𝑥-coordinate and the
𝑦-coordinate and plot those points.
And that’s what we get. Now most of the points pretty
strongly suggest a straight line relationship between 𝑥 and 𝑦, between the
diameter and the circumference, but there is one point that looks very different
to the others. Now what’s going on here? So a number of different
possibilities come to mind. I mean it could be that the
ruler that that student was using was extraordinarily sensitive to changes in
temperature so it expands and contracts as it heats up or cools down, and maybe
they were doing their measuring in an environment where the temperature was
rapidly changing. It could be that there was a
bizarre gravitational event nearby which massively warped space-time while the
student was making their measurements. It could be that they found a
bizarre circle which looks very different and has different properties to all
the other circles or maybe it could be that the student was just very bad at
measuring or possibly they just write down the numbers the wrong way round. Well we don’t know which one of
those is the real situation, and we can’t really make any assumptions. It looks very likely that
they’ve just transposed the diameter and the circumference. But because this is secondary
data, we don’t have access to the original circles, we don’t have access to the
original students, I think what we’re gonna do is just assume that it looks very
different. It’s probably wrong; we’re
going to ignore that piece of data for now. And you do have to be very
careful about throwing away bits of data that you just don’t like the look of
because you can obviously skew your results. But from what we know about
circles and the way that they work and about geometry, I think it’s pretty clear
that that-that-that does look like a dodgy piece of data, so I think in this
particular case we’re safe to ignore it. So let’s draw a line of best
fit through the rest of the points.
Now that looks like a
reasonable line of best fit for the rest of those points. So to work out the equation of
that line, we’re gonna have to find the intercept and the slope. Well that line seems to go
through the origin. So the 𝑦-intercept, when 𝑥 is
zero, the 𝑦-value is zero. And to work out the slope, I’m
gonna pick two points which are on my grid lines, and I’m gonna work out the
difference in 𝑥 and the difference in 𝑦 again. So in going from this point to
this point, the 𝑦-coordinate’s gone up from zero up to thirty, so the
difference in 𝑦-coordinated is thirty. And between those same two
points, the 𝑥-coordinate is going up from zero up to nine point five. So the slope looks like it’s
thirty divided by nine point five, which is about three point one six. And because we’ve got a linear
relationship, our equation is gonna look something like 𝑦 equals 𝑚 𝑥 plus 𝑏,
and we’ve calculated the slope is three point one six and the intercept is
zero. So there’s our equation 𝑦
equals three point one six 𝑥 plus zero. Then again, we don’t usually
bother writing plus zero on the end of our equations, so we’re just gonna go
with 𝑦 equals three point one six 𝑥.
So just thinking back that 𝑥
represents the diameter in inches and 𝑦 represents the circumference of inches,
then for those circles we’re saying wit-with this data that we have, we reckon
that the circumference is roughly equal to three point one six times the
diameter. And just sort of interpreting
those parameters, the intercept here at zero, that makes sense; so if we’ve got
a circle that’s got a diameter of zero, we haven’t really got a circle so the
circumference will be zero as well. So we’re, we’re sort of happy
with the interpretation of that, and it means that every time we add an inch to
the diameter of our circle, we’re gonna multiply that by three point one six to
get the circumference. So every extra inch on the
diameter adds three point one six inches to the circumference of the circle. Now even from here, I can hear
those of you who are paying attention in your geometry classes screaming at me
that, “But we know the circumference of a circle is 𝜋 times the diameter!” So what we’ve done in our
little experiment here with these nine students is we’ve calculated using
statistical techniques an approximation for the value of 𝜋. These two things are completely
compatible with each other except one of them is a bit more inaccurate than the
other one. Because our students are doing
measuring, they’re not always doing that a hundred percent accurately, so some
of these points are not quite on the line, although in theory they should all be
exactly on a straight line. But with all these errors, when
we add all these errors up, our estimation of the value of 𝜋 has come out
slightly wrong; it’s three point one six instead of three point one four one
five nine blah blah blah blah blah. But nonetheless, it’s not a bad
estimate. So hopefully, the couple of
examples we’ve just looked have given you a chance to see the value of scatter
plots and how useful they can be in interpreting data. But more importantly perhaps,
they’ve enabled you to calculate the equation of a straight line and to
interpret some of the values. So the intercept, the
𝑦-intercept, and the slope of that line, we’ve interpreted that in-in some sort
of a real-life context.