Now that we’ve seen what a derivative means and what it has to do with rates of
change, our next step is to learn how to actually compute these guys. As in if I give you some kind of function with an explicit formula, you’d wanna be
able to find what the formula for its derivative is. Maybe it’s obvious, but I think it’s worth stating explicitly why this is an
important thing to be able to do. Why much of a calculus student’s time ends up going towards grappling with
derivatives of abstract functions rather than thinking about concrete rate of change
It’s because a lot of real-world phenomena, the sort of things that we wanna use
calculus to analyze, are modeled using polynomials, trigonometric functions,
exponentials and other pure functions like that. So if you build up some fluency with the ideas of rates of change for those kinds of
pure abstract functions. It gives you a language to more readily talk about the rates at which things change
in concrete situations that you might be using calculus to model.
But it is way too easy for this process to feel like just memorizing a list of
rules. And if that happens, if you get that feeling, it’s also easy to lose sight of the
fact that derivatives are fundamentally about just looking at tiny changes to some
quantity. And how that relates to a resulting tiny change in another quantity. So in this video and in the next one, my aim is to show you how you can think about a
few of these rules intuitively and geometrically. And I really wanna encourage you to never forget that tiny nudges are at the heart of
Let’s start with a simple function like 𝑓 of 𝑥 equals 𝑥 squared. What if I asked you its derivative? That is, if you were to look at some value 𝑥, like 𝑥 equals two, and compare it to
a value slightly bigger, just d𝑥 bigger. What’s the corresponding change in the value of the function, d𝑓? And, in particular, what’s d𝑓 divided by d𝑥, the rate at which this function is
changing per unit change in 𝑥?
As a first step for intuition, we know that you can think of this ratio d𝑓 d𝑥 as
the slope of a tangent line to the graph of 𝑥 squared. And from that, you can see that the slope generally increases as 𝑥 increases. At zero, the tangent line is flat and the slope is zero. At 𝑥 equals one, it’s something a bit steeper. At 𝑥 equals two, it’s steeper still. But looking at graphs isn’t generally the best way to understand the precise formula
for a derivative. For that, it’s best to take a more literal look at what 𝑥 squared actually
And in this case, let’s go ahead and picture a square whose side length is 𝑥. If you increase 𝑥 by some tiny nudge, some little d𝑥, what’s the resulting change
in the area of that square? That slight change in area is what d𝑓 means in this context. It’s the tiny increase to the value of 𝑓 of 𝑥 equals 𝑥 squared caused by
increasing 𝑥 by that tiny nudge d𝑥. Now you can see that there’s three new bits of area in this diagram, two thin
rectangles and a miniscule square. The two thin rectangles each have side lengths of 𝑥 and d𝑥. So they account for two times 𝑥 times d𝑥 units of new area.
For example, let’s say 𝑥 was three and d𝑥 was 0.01. Then that new area from these two thin rectangles would be two times three times
0.01, which is 0.06, about six times the size of d𝑥. That little square there has an area of d𝑥 squared, but you should think of that as
being really tiny, negligibly tiny. For example, if d𝑥 was 0.01, that would be only 0.0001. And keep in mind, I’m drawing d𝑥 with a fair bit of width here just so we can
actually see it. But always remember, in principle, d𝑥 should be thought of as a truly tiny
amount. And for those truly tiny amounts, a good rule of thumb is that you can ignore
anything that includes a d𝑥 raised to a power greater than one. That is, a tiny change squared is a negligible change.
What this leaves us with is that d𝑓 is just some multiple of d𝑥. And that multiple — two 𝑥, which you could also write as d𝑓 divided by d𝑥 — is the
derivative of 𝑥 squared. For example, if you were starting at 𝑥 equals three, then as you slightly increase
𝑥, the rate of change in the area per unit change in length added, d𝑥 squared over
d𝑥, would be two times three, or six. And if instead you were starting at 𝑥 equals five, then the rate of change would be
10 units of area per unit change in 𝑥.
Let’s go ahead and try a different simple function, 𝑓 of 𝑥 equals 𝑥 cubed. This is gonna be the geometric view of the stuff that I went through algebraically in
the last video. What’s nice here is that we can think of 𝑥 cubed as the volume of an actual cube
whose side lengths are 𝑥. And when you increase 𝑥 by a tiny nudge, a tiny d𝑥, the resulting increase in
volume is what I have here in yellow. That represents all the volume in a cube with side lengths 𝑥 plus d𝑥 that’s not
already in the original cube, the one with side length 𝑥. It’s nice to think of this new volume as broken up into multiple components. But almost all of it comes from these three-square faces. Or, said a little more precisely, as d𝑥 approaches zero, those three squares
comprise a portion closer and closer to 100 percent of that new yellow volume. Each of those thin squares has a volume of 𝑥 squared times d𝑥, the area of the face
times that little thickness d𝑥. So, in total, this gives us three 𝑥 squared d𝑥 of volume change.
And to be sure, there are other slivers of volume here, along the edges, and that
tiny one in the corner. But all of that volume is gonna be proportional to d𝑥 squared or d𝑥 cubed, so we
can safely ignore them. Again, this is ultimately because they’re gonna divided by d𝑥. And if there’s still any d𝑥 remaining, then those terms aren’t gonna survive the
process of letting d𝑥 approach zero. What this means is that the derivative of 𝑥 cubed, the rate at which 𝑥 cubed
changes per unit change of 𝑥, is three times 𝑥 squared. What that means in terms of graphical intuition is that the slope of the graph of 𝑥
cubed at every single point 𝑥 is exactly three 𝑥 squared.
And reasoning about that slope, it should make sense that this derivative is high on
the left, and then zero at the origin, and then high again as you move to the
right. But just thinking in terms of the graph would never have landed us on the precise
quantity three 𝑥 squared. For that, we had to take a much more direct look at what 𝑥 cubed actually means.
Now in practice, you wouldn’t necessarily think of the square every time you’re
taking the derivative of 𝑥 squared nor would you necessarily think of this cube
whenever you’re taking the derivative of 𝑥 cubed. Both of them fall under a pretty recognizable pattern for polynomial terms. The derivative of 𝑥 to the fourth turns out to be four 𝑥 cubed. The derivative of 𝑥 to the fifth is five 𝑥 to the fourth, and so on. Abstractly, you’d write this as the derivative of 𝑥 to the 𝑛, for any power 𝑛, is
𝑛 times 𝑥 to the 𝑛 minus one. This right here is what’s known in the business as the power rule.
In practice, we all quickly just get jaded and think about this symbolically as the
exponent hopping down in front, leaving behind one less than itself. Rarely pausing to think about the geometric delights that underlie these
derivatives. That’s the kinda thing that happens when these tend to fall in the middle of much
longer computations. But rather than tracking it all off to symbolic patterns, let’s just take a moment
and think about why this works for powers beyond just two and three. When you nudge that input 𝑥, increasing it slightly to 𝑥 plus d𝑥, working out the
exact value of that nudged output would involve multiplying together these 𝑛
separate 𝑥 plus d𝑥 terms. The full expansion would be really complicated, but part of the point of derivatives
is that most of that complication can be ignored. The first term in your expansion is 𝑥 to the 𝑛. This is analogous to the area of the original square or the volume of the original
cube from our previous examples.
For the next terms in the expansion, you can choose mostly 𝑥s with a single d𝑥. Since there are 𝑛 different parentheticals from which you could have chosen that
single d𝑥. This gives us 𝑛 separate terms all of which include 𝑛 minus one 𝑥s times a d𝑥,
giving a value of 𝑥 to the power 𝑛 minus one times d𝑥. This is analogous to how the majority of the new area in the square came from those
two bars, each with area 𝑥 times d𝑥. Or how the bulk of the new volume in the cube came from those three thin squares,
each of which had a volume of 𝑥 squared times d𝑥. There will be many other terms of this expansion. But all of them are just gonna be some multiple of d𝑥 squared, so we can safely
And what that means is that all but a negligible portion of the increase in the
output comes from 𝑛 copies of this 𝑥 to the 𝑛 minus one times d𝑥. That’s what it means for the derivative of 𝑥 to the 𝑛 to be 𝑛 times 𝑥 to the 𝑛
minus one. And even though, like I said in practice, you’ll find yourself performing this
derivative quickly and symbolically, imagining the exponent hopping down to the
front. Every now and then, it’s nice to just step back and remember why these rules
work. Not just because it’s pretty and not just because it helps remind us that math
actually makes sense and isn’t just a pile of formulas to memorize. But because it flexes that very important muscle of thinking about derivatives in
terms of tiny nudges.
As another example, think of the function 𝑓 of 𝑥 equals one divided by 𝑥. Now, on the one hand, you could just blindly try applying the power rule since one
divided by 𝑥 is the same as writing 𝑥 to the negative one. That would involve letting the negative one hop down in front leaving behind one less
than itself, which is negative two. But let’s have some fun and see if we can reason about this geometrically rather than
just plugging it through some formula. The value one over 𝑥 is asking what number multiplied by 𝑥 equals one. So here’s how I’d like to visualize it.
Imagine a little rectangular puddle of water sitting in two dimensions whose area is
one. And let’s say that its width is 𝑥, which means that the height has to be one over
𝑥, since the total area of it is one. So if 𝑥 was stretched out to two, then that height is forced down to one-half. And if you increased 𝑥 up to three, then the other side has to be squished down to
one-third. This is a nice way to think about the graph of one over 𝑥, by the way. If you think of this width, 𝑥, of the puddle as being in the 𝑥𝑦-plane. Then that corresponding output — one divided by 𝑥, the height of the graph above
that point — is whatever the height of your puddle has to be to maintain an area of
So with this visual in mind, for the derivative, imagine nudging up that value of 𝑥
by some tiny amount, some tiny d𝑥. How must the height of this rectangle change so that the area of the puddle remains
constant at one? That is, increasing the width by d𝑥, add some new area to the right here. So the puddle has to decrease in height by some d one over 𝑥 so that the area lost
off of that top cancels out the area gained. You should think of that d one over 𝑥 as being a negative amount, by the way, since
it’s decreasing the height of the rectangle.
And you know what? I’m gonna leave the last few steps here for you, for you to pause and ponder and work
out an ultimate expression. And once you reason out what d of one over 𝑥 divided by d𝑥 should be. I want you to compare it to what you would’ve gotten if you had just blindly applied
the power rule, purely symbolically, to 𝑥 to the negative one. And while I’m encouraging you to pause and ponder, here’s another fun challenge, if
you’re feeling up to it. See if you can reason through what the derivative of the square root of 𝑥 should
To finish things off, I wanna tackle one more type of function, trigonometric
functions. And in particular, let’s focus on the sine function. So for this section, I’m gonna assume that you’re already familiar with how to think
about trig functions using the unit circle. The circle with a radius one centered at the origin. For a given value of 𝜃, like say 0.8, you imagine yourself walking around the circle
starting from the rightmost point until you’ve traversed that distance of 0.8 in arc
length. This is the same thing as saying that the angle right here is exactly 𝜃 radians,
since the circle has a radius of one.
Then what sin of 𝜃 means is the height of that point above the 𝑥-axis. And as your 𝜃-value increases and you walk around the circle, your height bobs up
and down between negative one and one. So when you graph sin of 𝜃 versus 𝜃, you get this wave pattern, the quintessential
wave pattern. And just from looking at this graph, we can start to get a feel for the shape of the
derivative of the sine. The slope at zero is something positive since sin of 𝜃 is increasing there. And as we move to the right and sin of 𝜃 approaches its peak, that slope goes down
to zero. Then the slope is negative for a little while, while the sine is decreasing before
coming back up to zero as the sine graph levels out.
And as you continue thinking this through and drawing it out, if you’re familiar with
the graph of trig functions, you might guess that this derivative graph should be
exactly cos of 𝜃 since all the peaks and valleys line up perfectly with where the
peaks and valleys for the cosine function should be. And, spoiler alert, the derivative is in fact the cos of 𝜃. But aren’t you a little curious about why it’s precisely cos of 𝜃?
I mean you could have all sorts of functions with peaks and valleys at the same
points that have roughly the same shape, but who knows? Maybe the derivative of sine could’ve turned out to be some entirely new type of
function that just happens to have a similar shape. Well, just like the previous examples, a more exact understanding of the derivative
requires looking at what the function actually represents rather than looking at the
graph of the function.
So think back to that walk around the unit circle. Having traversed an arc with length 𝜃 and thinking about sin of 𝜃 as the height of
that point. Now, zoom in to that point on the circle and consider a slight nudge of d𝜃 along
their circumference, a tiny step in your walk around the unit circle. How much does that tiny step change the sin of 𝜃? How much does this increase, d𝜃, of arc length increase the height above the
𝑥-axis? Well, zoomed in close enough, the circle basically looks like a straight line in this
neighborhood. So let’s go ahead and think of this right triangle where the hypotenuse of that right
triangle represents the nudge, d𝜃, along the circumference. And that left side here represents the change in height, the resulting d sin of
Now this tiny triangle is actually similar to this larger triangle here, with the
defining angle 𝜃 and whose hypotenuse is the radius of the circle with length
one. Specifically, this little angle right here is precisely equal to 𝜃 radians. Now, think about what the derivative of sine is supposed to mean. It’s the ratio between that d sin of 𝜃, the tiny change to the height, divided by
d𝜃, the tiny change to the input of the function. And from the picture, we can see that that’s the ratio between the length of the side
adjacent to the angle 𝜃 divided by the hypotenuse. Well let’s see. Adjacent divided by hypotenuse, that’s exactly what the cos of 𝜃 means. That’s the definition of the cosine.
So this gives us two different really nice ways of thinking about how the derivative
of sine is cosine. One of them is looking at the graph and getting a loose feel for the shape of things
based on thinking about the slope of the sine graph at every single point. And the other is a more precise line of reasoning looking at the unit circle
itself. For those of you that like to pause and ponder, see if you can try a similar line of
reasoning to find what the derivative of the cos of 𝜃 should be.