Video Transcript
The goal here is simple: explain what a derivative is. The thing is though, there’s some subtlety to this topic and a lot of potential for
paradoxes, if you’re not careful. So kind of a secondary goal is that you have an appreciation for what those paradoxes
are and how to avoid them.
You see, it’s common for people to say that the derivative measures an instantaneous
rate of change. But when you think about it, that phrase is actually an oxymoron. Change is something that happens between separate points in time. And when you blind yourself to all but just a single instant, there’s not really any
room for change. You’ll see what I mean more as we get into it. But when you appreciate that a phrase like “instantaneous rate of change” is actually
nonsense, I think it makes you appreciate just how clever the fathers of calculus
were in capturing the idea that that phrase is meant to evoke, but with a perfectly
sensible piece of math, the derivative.
As our central example, I want you to imagine a car that starts at some point A,
speeds up, and then slows down to a stop at some point B 100 meters away. And let’s say it all happens over the course of 10 seconds. That’s the set-up to have in mind as we lay out what the derivative is. We could graph this motion, letting the vertical axis represent the distance traveled
and the horizontal axis represent time. So at each time 𝑡, represented with a point on this horizontal axis, the height of
the graph tells us how far the car has traveled in total after that amount of
time.
It’s pretty common to name a distance function like this, 𝑠 of 𝑡. I would use the letter 𝑑 for distance. But that guy already has another full-time job in calculus. Initially, this curve is quite shallow since the car is slow to start. During that first second, the distance that it travels doesn’t really change that
much. Then, for the next few seconds, as the car speeds up, the distance traveled in a
given second gets larger, which corresponds to a steeper slope in this graph. And then towards the end when it slows down, that curve shallows out again.
And if we were to plot the car’s velocity in meters per second as a function of time,
it might look like this bump. At early times, the velocity is very small. Up to the middle of the journey, the car builds up to some maximum velocity, covering
a relatively large distance each second. Then, it slows back down towards the speed of zero. And these two curves here are definitely related to each other, right? If you change the specific distance-versus-time function, you’re gonna have some
different velocity-versus-time function. And what we wanna understand is the specifics of that relationship. Exactly, how does velocity depend on a distance-versus-time function?
And to do that, it’s worth taking a moment to think critically about what exactly
velocity means here. Intuitively, we all might know what velocity at a given moment means. It’s just whatever the car’s speedometer shows in that moment. And intuitively, it might make sense that the car’s velocity should be higher at
times when this distance function is steeper, when the car traverses more distance
per unit time. But the funny thing is, velocity at a single moment makes no sense. If I show you a picture of a car, just a snapshot in an instant, and ask you how fast
it’s going, you’d have no way of telling me. What you’d need are two separate points in time to compare. That way you can compute whatever the change in distance across those times is,
divided by the change in time, right? I mean, that’s what velocity is. It’s the distance traveled per unit time.
So how is it that we’re looking at a function for velocity that only takes in a
single value of 𝑡, a single snapshot in time? It’s weird, isn’t it? We wanna associate individual points in time with the velocity. But actually, computing velocity requires comparing two separate points in times. If that feels strange and paradoxical, good! You’re grappling with the same conflicts that the fathers of calculus did. And if you want a deep understanding for rates of change, not just for a moving car
but for all sorts of things in science, you’re gonna need to resolve this apparent
paradox.
First, I think it’s best to talk about the real world. And then, we’ll go into a purely mathematical one. Let’s think about what the car’s speedometer is probably doing. At some point, say three seconds into the journey, the speedometer might measure how
far the car goes in a very small amount of time, maybe the distance traveled between
three seconds and 3.01 seconds. Then, it could compute the speed in meters per second as that tiny distance traversed
in meters divided by that tiny time, 0.01 seconds. That is, a physical car just sidesteps the paradox and doesn’t actually compute speed
at a single point in time. It computes speed during a very small amount of time. So let’s call that difference in time d𝑡, which you might think of in this case as
0.01 seconds. And let’s call that resulting difference in distance d𝑠. So the velocity at some point in time is d𝑠 divided by d𝑡, the tiny change in
distance over the tiny change in time.
Graphically, you can imagine zooming in on some point of this distance-versus-time
graph above 𝑡 equals three. That d𝑡 is a small step to the right since time is on the horizontal axis. And that d𝑠 is the resulting change in the height of the graph since the vertical
axis represents distance traveled. So d𝑠 divided by d𝑡 is something you can think of as the rise-over-run slope
between two very close points on this graph. Of course, there’s nothing special about the value 𝑡 equals three. We could apply this to any other point in time. So, we consider this expression d𝑠 over d𝑡 to be a function of 𝑡, something where
I can give you a time 𝑡. And you can give me back the value of this ratio at that time, the velocity as a
function of time.
So, for example, when I had the computer draw this bump curve here, the one
representing the velocity function, here’s what I had the computer actually do. First, I chose a small value for d𝑡. I think in this case it was 0.01. Then, I had the computer look at a whole bunch of times 𝑡 between zero and 10 and
compute the distance function 𝑠 at 𝑡 plus d𝑡 and then subtract off the value of
that function at 𝑡. In other words, that’s the difference in the distance traveled between the given
time, 𝑡, and the time 0.01 seconds after that. Then, you can just divide that difference by the change in time, d𝑡. And that gives you the velocity in meters per second around each point in time.
So with a formula like this, you could give the computer any curve representing any
distance function, 𝑠 of 𝑡. And it could figure out the curve representing velocity. So now would be a good time to pause, reflect, make sure that this idea of relating
distance to velocity by looking at tiny changes makes sense. Because what we’re gonna do is tackle the paradox of the derivative head-on.
This idea of d𝑠 over d𝑡, a tiny change in the value of the function 𝑠 divided by
the tiny change in the input that caused it, that’s almost what a derivative is. And Even though a car’s speedometer will actually look at a concrete change in time,
like 0.01 seconds, and even though the drawing program here is looking at an actual
concrete change in time, in pure math the derivative is not this ratio d𝑠 d𝑡 for a
specific choice of d𝑡. Instead, it’s whatever that ratio approaches as your choice for d𝑡 approaches
zero. Luckily, there is a really nice visual understanding for what it means to ask what
this ratio approaches.
Remember, for any specific choice of d𝑡, this ratio d𝑠 d𝑡 is the slope of a line
passing through two separate points on the graph, right? Well, as d𝑡 approaches zero and as those two points approach each other, the slope
of the line approaches the slope of a line that’s tangent to the graph at whatever
point 𝑡 we’re looking at. So, the true, honest-to-goodness, pure math derivative is not the rise-over-run slope
between two nearby points on the graph. It’s equal to the slope of a line tangent to the graph at a single point. Now notice what I’m not saying. I’m not saying that the derivative is whatever happens when d𝑡 is infinitely small,
whatever that would mean, nor am I saying that you plug in zero for d𝑡. This d𝑡 is always a finitely small, nonzero value. It’s just that it approaches zero is all.
I think that’s really clever. Even though change in an instant makes no sense, this idea of letting d𝑡 approach
zero is a really sneaky backdoor way to talk reasonably about the rate of change at
a single point in time. Isn’t that neat?! It’s kind of flirting with the paradox of change in an instant without ever needing
to actually touch it. And it comes with such a nice visual intuition too, as the slope of a tangent line to
a single point on the graph. And because change in an instant still makes no sense, I think it’s healthiest for
you to think of this slope not as some instantaneous rate of change, but instead as
the best constant approximation for a rate of change around a point.
By the way, it’s worth saying a couple of words on notation here. Throughout this video, I’ve been using d𝑡 to refer to a tiny change in 𝑡 with some
actual size and d𝑠 to refer to the resulting tiny change in 𝑠, which again has an
actual size. And this is because that’s how I want you to think about them. But the convention in calculus is that whenever you’re using the letter 𝑑 like this,
you’re kind of announcing your intention that eventually you’re gonna see what
happens as d𝑡 approaches zero. For example, the honest-to-goodness pure math derivative is written as d𝑠 divided by
d𝑡, even though it’s technically not a fraction, per say, but whatever that
fraction approaches for smaller and smaller nudges in 𝑡. I think a specific example should help here.
You might think that asking about what this ratio approaches for smaller and smaller
values would make it much more difficult to compute. But, weirdly, it kind of makes things easier. Let’s say that you have a given distance-versus-time function that happens to be
exactly 𝑡 cubed. So after one second, the car has traveled one cubed equals one meters. After two seconds, it’s traveled two cubed, or eight, meters, and so on. Now what I’m about to do might seem somewhat complicated. But once the dust settles, it really is simpler. And more importantly, it’s the kinda thing that you only ever have to do once in
calculus.
Let’s say you wanted to compute the velocity, d𝑠 divided by d𝑡, at some specific
time, like 𝑡 equals two. And for right now, let’s think of d𝑡 as having an actual size, some concrete
nudge. We’ll let it go to zero in just a bit. The tiny change in distance between two seconds and two plus d𝑡 seconds, well that’s
𝑠 of two plus d𝑡 minus 𝑠 of two, and we divide that by d𝑡. Since our function is 𝑡 cubed, that numerator looks like two plus d𝑡 cubed minus
two cubed. And this, this is something can work out algebraically. Again, bear with me. There’s a reason that I’m showing you the details here. When you expand that top, what you get is two cubed plus three times two squared d𝑡
plus three times two times d𝑡 squared plus d𝑡 cubed. And all of that is minus two cubed.
Now there’s a lot of terms. And I want you to remember that it looks like a mess, but it does simplify. Those two cubed terms, they cancel out. And then everything remaining here has a d𝑡 in it. And since there’s a d𝑡 on the bottom there, many of those cancel out as well. What this means is that the ratio, d𝑠 divided by d𝑡, has boiled down into three
times two squared plus, well, two different terms that each have a d𝑡 in them. So if we ask what happens as d𝑡 approaches 0, representing the idea of looking at a
smaller and smaller change in time, we can just completely ignore those other
terms. By eliminating the need to think about a specific d𝑡, we’ve actually eliminated a
lot of the complication in the full expression! So what we’re left with is this nice clean three times two squared.
You can think of that as meaning that the slope of a line tangent to the point at 𝑡
equals two of this graph is exactly three times two squared or 12. And of course, there’s nothing special about the time 𝑡 equals two. We could more generally say that the derivative of 𝑡 cubed as a function of 𝑡, is
three times 𝑡 squared.
Now take a step back because that’s beautiful. This derivative is this crazy complicated idea. We’ve got tiny changes in distance over tiny changes in time. But instead of looking at any specific one of those, we’re talking about what that
thing approaches. I mean, that’s a lot to think about! And yet, what we’ve come out with is such a simple expression, three times 𝑡
squared. And in practice, you wouldn’t go through all this algebra each time. Knowing that the derivative of 𝑡 cubed is three 𝑡 squared is one of those things
that all calculus students learn how to do immediately without having to rederive it
each time.
And in the next video, I’m gonna show you a nice way to think about this and a couple
of other derivative formulas in really nice geometric ways. But the point I wanna make by showing you all of the algebraic guts here is that when
you consider the tiny change in distance caused by a tiny change in time for some
specific value of d𝑡, you’d have kind of a mess. But when you consider what that ratio approaches as d𝑡 approaches zero, it lets you
ignore much of that mess. And it really does simplify the problem. That right there is kind of the heart of why calculus becomes useful.
Another reason to show you a concrete derivative like this is that it sets the stage
for an example of the kind of paradoxes that come about if you believe too much in
the illusion of instantaneous rate of change. So think about the actual car traveling according to this 𝑡 cubed to distance
function. And consider its motion at the moment 𝑡 equals zero, right at the start. Now ask yourself whether or not the car is moving at that time. On the one hand, we can compute its speed at that point using the derivative, three
𝑡 squared, which for time 𝑡 equals zero works out to be zero.
Visually, this means that the tangent line to the graph at that point is perfectly
flat. So the car’s, quote, unquote, “instantaneous velocity” is zero. And that suggests that obviously it’s not moving. But on the other hand, if it doesn’t start moving at time zero, when does it start
moving? Really, pause and ponder that for a moment. Is the car moving at time 𝑡 equals zero?
Do you see the paradox? The issue is that the question makes no sense. It references the idea of change in a moment. But that doesn’t actually exist. That’s just not what the derivative measures. What it means for the derivative of a distance function to be zero is that the best
constant approximation for the car’s velocity around that point is zero meters per
second. For example, if you look at an actual change in time, say between time zero and 0.1
seconds, the car does move. It moves 0.001 meters. That’s very small. And importantly, it’s very small compared to the change in time, giving an average
speed of only 0.01 meters per second.
And remember, what it means for the derivative of this motion to be zero is that for
smaller and smaller nudges in time, this ratio of meters per second approaches
zero. But that’s not to say that the car is static. Approximating its movement with a constant velocity of zero is, after all, just an
approximation. So whenever you hear people refer to the derivative as an instantaneous rate of
change, a phrase which is intrinsically oxymoronic, I want you to think of that as a
conceptual shorthand for the best constant approximation for rate of change.
In the next couple of videos, I’ll be talking more about the derivative, what it
looks like in different contexts. How do you actually compute it? Why is it useful? Things like that, focusing on visual intuition as always.