# Video: Derivative Formulas through Geometry

Grant Sanderson • 3Blue1Brown • Boclips

Derivative Formulas through Geometry

16:56

### Video Transcript

Now that we’ve seen what a derivative means and what it has to do with rates of change, our next step is to learn how to actually compute these guys. As in if I give you some kind of function with an explicit formula, you’d wanna be able to find what the formula for its derivative is. Maybe it’s obvious, but I think it’s worth stating explicitly why this is an important thing to be able to do. Why much of a calculus student’s time ends up going towards grappling with derivatives of abstract functions rather than thinking about concrete rate of change problems?

It’s because a lot of real-world phenomena, the sort of things that we wanna use calculus to analyze, are modeled using polynomials, trigonometric functions, exponentials and other pure functions like that. So if you build up some fluency with the ideas of rates of change for those kinds of pure abstract functions. It gives you a language to more readily talk about the rates at which things change in concrete situations that you might be using calculus to model.

But it is way too easy for this process to feel like just memorizing a list of rules. And if that happens, if you get that feeling, it’s also easy to lose sight of the fact that derivatives are fundamentally about just looking at tiny changes to some quantity. And how that relates to a resulting tiny change in another quantity. So in this video and in the next one, my aim is to show you how you can think about a few of these rules intuitively and geometrically. And I really wanna encourage you to never forget that tiny nudges are at the heart of derivatives.

Let’s start with a simple function like 𝑓 of 𝑥 equals 𝑥 squared. What if I asked you its derivative? That is, if you were to look at some value 𝑥, like 𝑥 equals two, and compare it to a value slightly bigger, just d𝑥 bigger. What’s the corresponding change in the value of the function, d𝑓? And, in particular, what’s d𝑓 divided by d𝑥, the rate at which this function is changing per unit change in 𝑥?

As a first step for intuition, we know that you can think of this ratio d𝑓 d𝑥 as the slope of a tangent line to the graph of 𝑥 squared. And from that, you can see that the slope generally increases as 𝑥 increases. At zero, the tangent line is flat and the slope is zero. At 𝑥 equals one, it’s something a bit steeper. At 𝑥 equals two, it’s steeper still. But looking at graphs isn’t generally the best way to understand the precise formula for a derivative. For that, it’s best to take a more literal look at what 𝑥 squared actually means.

And in this case, let’s go ahead and picture a square whose side length is 𝑥. If you increase 𝑥 by some tiny nudge, some little d𝑥, what’s the resulting change in the area of that square? That slight change in area is what d𝑓 means in this context. It’s the tiny increase to the value of 𝑓 of 𝑥 equals 𝑥 squared caused by increasing 𝑥 by that tiny nudge d𝑥. Now you can see that there’s three new bits of area in this diagram, two thin rectangles and a miniscule square. The two thin rectangles each have side lengths of 𝑥 and d𝑥. So they account for two times 𝑥 times d𝑥 units of new area.

For example, let’s say 𝑥 was three and d𝑥 was 0.01. Then that new area from these two thin rectangles would be two times three times 0.01, which is 0.06, about six times the size of d𝑥. That little square there has an area of d𝑥 squared, but you should think of that as being really tiny, negligibly tiny. For example, if d𝑥 was 0.01, that would be only 0.0001. And keep in mind, I’m drawing d𝑥 with a fair bit of width here just so we can actually see it. But always remember, in principle, d𝑥 should be thought of as a truly tiny amount. And for those truly tiny amounts, a good rule of thumb is that you can ignore anything that includes a d𝑥 raised to a power greater than one. That is, a tiny change squared is a negligible change.

What this leaves us with is that d𝑓 is just some multiple of d𝑥. And that multiple — two 𝑥, which you could also write as d𝑓 divided by d𝑥 — is the derivative of 𝑥 squared. For example, if you were starting at 𝑥 equals three, then as you slightly increase 𝑥, the rate of change in the area per unit change in length added, d𝑥 squared over d𝑥, would be two times three, or six. And if instead you were starting at 𝑥 equals five, then the rate of change would be 10 units of area per unit change in 𝑥.

Let’s go ahead and try a different simple function, 𝑓 of 𝑥 equals 𝑥 cubed. This is gonna be the geometric view of the stuff that I went through algebraically in the last video. What’s nice here is that we can think of 𝑥 cubed as the volume of an actual cube whose side lengths are 𝑥. And when you increase 𝑥 by a tiny nudge, a tiny d𝑥, the resulting increase in volume is what I have here in yellow. That represents all the volume in a cube with side lengths 𝑥 plus d𝑥 that’s not already in the original cube, the one with side length 𝑥. It’s nice to think of this new volume as broken up into multiple components. But almost all of it comes from these three-square faces. Or, said a little more precisely, as d𝑥 approaches zero, those three squares comprise a portion closer and closer to 100 percent of that new yellow volume. Each of those thin squares has a volume of 𝑥 squared times d𝑥, the area of the face times that little thickness d𝑥. So, in total, this gives us three 𝑥 squared d𝑥 of volume change.

And to be sure, there are other slivers of volume here, along the edges, and that tiny one in the corner. But all of that volume is gonna be proportional to d𝑥 squared or d𝑥 cubed, so we can safely ignore them. Again, this is ultimately because they’re gonna divided by d𝑥. And if there’s still any d𝑥 remaining, then those terms aren’t gonna survive the process of letting d𝑥 approach zero. What this means is that the derivative of 𝑥 cubed, the rate at which 𝑥 cubed changes per unit change of 𝑥, is three times 𝑥 squared. What that means in terms of graphical intuition is that the slope of the graph of 𝑥 cubed at every single point 𝑥 is exactly three 𝑥 squared.

And reasoning about that slope, it should make sense that this derivative is high on the left, and then zero at the origin, and then high again as you move to the right. But just thinking in terms of the graph would never have landed us on the precise quantity three 𝑥 squared. For that, we had to take a much more direct look at what 𝑥 cubed actually means.

Now in practice, you wouldn’t necessarily think of the square every time you’re taking the derivative of 𝑥 squared nor would you necessarily think of this cube whenever you’re taking the derivative of 𝑥 cubed. Both of them fall under a pretty recognizable pattern for polynomial terms. The derivative of 𝑥 to the fourth turns out to be four 𝑥 cubed. The derivative of 𝑥 to the fifth is five 𝑥 to the fourth, and so on. Abstractly, you’d write this as the derivative of 𝑥 to the 𝑛, for any power 𝑛, is 𝑛 times 𝑥 to the 𝑛 minus one. This right here is what’s known in the business as the power rule.

In practice, we all quickly just get jaded and think about this symbolically as the exponent hopping down in front, leaving behind one less than itself. Rarely pausing to think about the geometric delights that underlie these derivatives. That’s the kinda thing that happens when these tend to fall in the middle of much longer computations. But rather than tracking it all off to symbolic patterns, let’s just take a moment and think about why this works for powers beyond just two and three. When you nudge that input 𝑥, increasing it slightly to 𝑥 plus d𝑥, working out the exact value of that nudged output would involve multiplying together these 𝑛 separate 𝑥 plus d𝑥 terms. The full expansion would be really complicated, but part of the point of derivatives is that most of that complication can be ignored. The first term in your expansion is 𝑥 to the 𝑛. This is analogous to the area of the original square or the volume of the original cube from our previous examples.

For the next terms in the expansion, you can choose mostly 𝑥s with a single d𝑥. Since there are 𝑛 different parentheticals from which you could have chosen that single d𝑥. This gives us 𝑛 separate terms all of which include 𝑛 minus one 𝑥s times a d𝑥, giving a value of 𝑥 to the power 𝑛 minus one times d𝑥. This is analogous to how the majority of the new area in the square came from those two bars, each with area 𝑥 times d𝑥. Or how the bulk of the new volume in the cube came from those three thin squares, each of which had a volume of 𝑥 squared times d𝑥. There will be many other terms of this expansion. But all of them are just gonna be some multiple of d𝑥 squared, so we can safely ignore them.

And what that means is that all but a negligible portion of the increase in the output comes from 𝑛 copies of this 𝑥 to the 𝑛 minus one times d𝑥. That’s what it means for the derivative of 𝑥 to the 𝑛 to be 𝑛 times 𝑥 to the 𝑛 minus one. And even though, like I said in practice, you’ll find yourself performing this derivative quickly and symbolically, imagining the exponent hopping down to the front. Every now and then, it’s nice to just step back and remember why these rules work. Not just because it’s pretty and not just because it helps remind us that math actually makes sense and isn’t just a pile of formulas to memorize. But because it flexes that very important muscle of thinking about derivatives in terms of tiny nudges.

As another example, think of the function 𝑓 of 𝑥 equals one divided by 𝑥. Now, on the one hand, you could just blindly try applying the power rule since one divided by 𝑥 is the same as writing 𝑥 to the negative one. That would involve letting the negative one hop down in front leaving behind one less than itself, which is negative two. But let’s have some fun and see if we can reason about this geometrically rather than just plugging it through some formula. The value one over 𝑥 is asking what number multiplied by 𝑥 equals one. So here’s how I’d like to visualize it.

Imagine a little rectangular puddle of water sitting in two dimensions whose area is one. And let’s say that its width is 𝑥, which means that the height has to be one over 𝑥, since the total area of it is one. So if 𝑥 was stretched out to two, then that height is forced down to one-half. And if you increased 𝑥 up to three, then the other side has to be squished down to one-third. This is a nice way to think about the graph of one over 𝑥, by the way. If you think of this width, 𝑥, of the puddle as being in the 𝑥𝑦-plane. Then that corresponding output — one divided by 𝑥, the height of the graph above that point — is whatever the height of your puddle has to be to maintain an area of one.

So with this visual in mind, for the derivative, imagine nudging up that value of 𝑥 by some tiny amount, some tiny d𝑥. How must the height of this rectangle change so that the area of the puddle remains constant at one? That is, increasing the width by d𝑥, add some new area to the right here. So the puddle has to decrease in height by some d one over 𝑥 so that the area lost off of that top cancels out the area gained. You should think of that d one over 𝑥 as being a negative amount, by the way, since it’s decreasing the height of the rectangle.

And you know what? I’m gonna leave the last few steps here for you, for you to pause and ponder and work out an ultimate expression. And once you reason out what d of one over 𝑥 divided by d𝑥 should be. I want you to compare it to what you would’ve gotten if you had just blindly applied the power rule, purely symbolically, to 𝑥 to the negative one. And while I’m encouraging you to pause and ponder, here’s another fun challenge, if you’re feeling up to it. See if you can reason through what the derivative of the square root of 𝑥 should be.

To finish things off, I wanna tackle one more type of function, trigonometric functions. And in particular, let’s focus on the sine function. So for this section, I’m gonna assume that you’re already familiar with how to think about trig functions using the unit circle. The circle with a radius one centered at the origin. For a given value of 𝜃, like say 0.8, you imagine yourself walking around the circle starting from the rightmost point until you’ve traversed that distance of 0.8 in arc length. This is the same thing as saying that the angle right here is exactly 𝜃 radians, since the circle has a radius of one.

Then what sin of 𝜃 means is the height of that point above the 𝑥-axis. And as your 𝜃-value increases and you walk around the circle, your height bobs up and down between negative one and one. So when you graph sin of 𝜃 versus 𝜃, you get this wave pattern, the quintessential wave pattern. And just from looking at this graph, we can start to get a feel for the shape of the derivative of the sine. The slope at zero is something positive since sin of 𝜃 is increasing there. And as we move to the right and sin of 𝜃 approaches its peak, that slope goes down to zero. Then the slope is negative for a little while, while the sine is decreasing before coming back up to zero as the sine graph levels out.

And as you continue thinking this through and drawing it out, if you’re familiar with the graph of trig functions, you might guess that this derivative graph should be exactly cos of 𝜃 since all the peaks and valleys line up perfectly with where the peaks and valleys for the cosine function should be. And, spoiler alert, the derivative is in fact the cos of 𝜃. But aren’t you a little curious about why it’s precisely cos of 𝜃?

I mean you could have all sorts of functions with peaks and valleys at the same points that have roughly the same shape, but who knows? Maybe the derivative of sine could’ve turned out to be some entirely new type of function that just happens to have a similar shape. Well, just like the previous examples, a more exact understanding of the derivative requires looking at what the function actually represents rather than looking at the graph of the function.

So think back to that walk around the unit circle. Having traversed an arc with length 𝜃 and thinking about sin of 𝜃 as the height of that point. Now, zoom in to that point on the circle and consider a slight nudge of d𝜃 along their circumference, a tiny step in your walk around the unit circle. How much does that tiny step change the sin of 𝜃? How much does this increase, d𝜃, of arc length increase the height above the 𝑥-axis? Well, zoomed in close enough, the circle basically looks like a straight line in this neighborhood. So let’s go ahead and think of this right triangle where the hypotenuse of that right triangle represents the nudge, d𝜃, along the circumference. And that left side here represents the change in height, the resulting d sin of 𝜃.

Now this tiny triangle is actually similar to this larger triangle here, with the defining angle 𝜃 and whose hypotenuse is the radius of the circle with length one. Specifically, this little angle right here is precisely equal to 𝜃 radians. Now, think about what the derivative of sine is supposed to mean. It’s the ratio between that d sin of 𝜃, the tiny change to the height, divided by d𝜃, the tiny change to the input of the function. And from the picture, we can see that that’s the ratio between the length of the side adjacent to the angle 𝜃 divided by the hypotenuse. Well let’s see. Adjacent divided by hypotenuse, that’s exactly what the cos of 𝜃 means. That’s the definition of the cosine.

So this gives us two different really nice ways of thinking about how the derivative of sine is cosine. One of them is looking at the graph and getting a loose feel for the shape of things based on thinking about the slope of the sine graph at every single point. And the other is a more precise line of reasoning looking at the unit circle itself. For those of you that like to pause and ponder, see if you can try a similar line of reasoning to find what the derivative of the cos of 𝜃 should be.