# Pop Video: Visualizing the Chain Rule and Product Rule

Grant Sanderson • 3Blue1Brown • Boclips

Visualizing the Chain Rule and Product Rule

15:36

### Video Transcript

In the last videos I talked about the derivatives of simple functions. And the goal was to have a clear picture or intuition to hold in your mind that actually explains where these formulas come from. But of course, most of the functions you deal with in modeling the world involve somehow mixing or combining or tweaking these simple functions in some other way. So our natural next step is to understand how you take derivatives of more complicated combinations. And again, I don’t want these to be something to memorize. I want you to have a clear picture in mind for where each one comes from.

Now this really boils down into three basic ways to combine functions. You can add them together. You can multiply them. And you can throw one inside the other, known as composing them. Sure, you could say subtracting them. But really, that’s just multiplying the second by negative one and adding them together. And likewise, dividing functions doesn’t really add anything cause that’s the same as plugging one inside the function, one over 𝑥, and then multiplying the two together. So really, most functions you come across just involve layering together these three different types of combinations. Though there’s not really a bound on how monstrous things can become. But as long as you know how derivatives play with just those three combination types. You’ll always be able to just take it step by step and peel through the layers for any kind of monstrous expression.

So, the question is if you know the derivative of two functions, what is the derivative of their sum, of their product, and of the function composition between them? The sum rule is easiest if somewhat tongue twisting to say out loud. The derivative of a sum of two functions is the sum of their derivatives. But it’s worth warming up with this example by really thinking through what it means to take a derivative of a sum of two functions. Since the derivative patterns for products and for function composition won’t be so straight forward. And they’re gonna require this kind of deeper thinking.

For example, let’s think about this function 𝑓 of 𝑥 equals sin of 𝑥 plus 𝑥 squared. It’s a function where, for every input, you add together the values of sin of 𝑥 and 𝑥 squared at that point. For example, let’s say at 𝑥 equals 0.5, the height of the sine graph is given by this vertical bar. And the height of the 𝑥 squared parabola is given by this slightly smaller vertical bar. And their sum is the length you get by just stacking them together. Now for the derivative, you wanna ask what happens as you nudge the input slightly. Maybe increasing it up to 0.5 plus d𝑥. The difference in the value of 𝑓 between those two places is what we call d𝑓. And when you picture it like this, I think you’ll agree that the total change in the height is whatever the change to the sine graph is, what we might call dsin of 𝑥. Plus whatever the change to 𝑥 squared is, d𝑥 squared.

Now we know that the derivative of sine is cosine. And remember what that means. It means that this little change dsin of 𝑥 is about cos of 𝑥 times d𝑥. It’s proportional to the size of our initial nudge, d𝑥. And the proportionality constant equals cosine of whatever input we happened to start at. Likewise, because the derivative of 𝑥 squared is two 𝑥. The change in the height of the 𝑥 squared graph is gonna be about two times 𝑥 times whatever d𝑥 was. So, rearranging, d𝑓 divided by d𝑥, the ratio of the tiny change to the sum function to the tiny change in 𝑥 that caused it, is indeed cos of 𝑥 plus two 𝑥, the sum of the derivatives of its parts. But like I said, things are a bit different for products. And let’s think through why. And let’s think through why in terms of tiny nudges again. In this case, I don’t think graphs are our best bet for visualizing things.

Pretty commonly in math, at a lot of levels of math really, if you’re dealing with a product of two things, it helps to understand it as some kind of area. In this case, maybe you try to configure some mental set-up of a box where the side lengths are sin of 𝑥 and 𝑥 squared. But what would that mean? Well, since these are functions, you might think of these sides as adjustable, dependent on the value of 𝑥. Which maybe you think of as this number that you can just freely adjust up and down. So getting a feel for what this means, focus on that top side there, who changes as the function sin of 𝑥. As you change this value of 𝑥 up from zero, it increases up to a length of one as sin of 𝑥 moves towards its peak. And after that, it starts to decrease as sin of 𝑥 comes down from one. And in the same way, that height there is always changing as 𝑥 squared.

So 𝑓 of 𝑥, defined as the product of these two functions, is gonna be the area of this box. And for the derivative, let’s think about how a tiny change to 𝑥 by d𝑥 influences that area. What is that resulting change in area, d𝑓? Well, the nudge d𝑥 caused that width to change by some small dsin of 𝑥. And it caused that height to change by some d𝑥 squared. And this gives us three little snippets of new area. A thin rectangle on the bottom whose area is its width, sin of 𝑥, times its thin height, d𝑥 squared. And there’s this thin rectangle on the right whose area is its height, 𝑥 squared, times its thin little width, dsin of 𝑥. And there’s also this little bit in the corner. But we can ignore that, its area is ultimately gonna be proportional to d𝑥 squared. And as we’ve seen before, that becomes negligible as d𝑥 goes to zero.

I mean this whole set-up is very similar to what I showed last video with the 𝑥 squared diagram. And just like then, keep in mind that I’m using somewhat beefy changes here to draw things, just so that we can actually see them. But in principle, d𝑥 is something very, very small. And that means that d𝑥 squared and dsin of 𝑥 are also very, very small. So applying what we know about the derivative of sin and of 𝑥 squared, that tiny change, d𝑥 squared, is gonna be about two 𝑥 times d𝑥. And that tiny change, dsin of 𝑥, well that’s gonna be about cos of 𝑥 times d𝑥. As usual, we divide out by that d𝑥 to see that the ratio we want, d𝑓 divided by d𝑥, is sin of 𝑥 times the derivative of 𝑥 squared plus 𝑥 squared times the derivative of sine.

And nothing we’ve done here is specific to sine or to 𝑥 squared. This same line of reasoning would work for any two functions, 𝑔 and ℎ. And sometimes people like to remember this pattern with a certain mnemonic that you kind of sing in your head: left d right, right d left. In this example, where we have sin of 𝑥 times 𝑥 squared, left d right means you take that left function, sin of 𝑥, times the derivative of the right, in this case two 𝑥. Then you add on right d left. That right function, 𝑥 squared, times the derivative of the left one, cos of 𝑥.

Now, out of context, presented as a rule to remember, I think this would feel pretty strange, don’t you? But when you actually think of this adjustable box, you can see what each of those terms represents. Left d right is the area of that little bottom rectangle. And right d left is the area of that rectangle on the side. By the way, I should mention that if you multiply by a constant, say two times sin of 𝑥, things end up a lot simpler. The derivative is just the same as the constant multiplied by the derivative of the function. In this case, two times cos of 𝑥. I’ll leave it to you to pause and ponder and just kinda verify that that makes sense.

Aside from addition and multiplication, the other common way to combine functions — and believe me, this one comes up all the time — is to shove one inside the other, function composition. For example, maybe we take the function 𝑥 squared and we just shove it on inside sin of 𝑥, to get this new function sin of 𝑥 squared. What do you think the derivative of that new function is? To think this one through, I’m gonna choose yet another way to visualize things. Just to emphasize that in creative math, we’ve got lots of options. I’ll put up three different number lines. The top one is gonna hold the value of 𝑥. The second one is gonna hold the value of 𝑥 squared. And that third line is gonna hold the value of sin of 𝑥 squared. That is, the function 𝑥 squared gets you from line one to line two. And the function sin gets you from line two to line three.

As I shift around this value of 𝑥, maybe moving it up to the value three. That second value stays pegged to whatever 𝑥 squared is, in this case moving up to nine. And that bottom value, being sin of 𝑥 squared, is gonna go to whatever sin of nine happens to be. So for the derivative, let’s again start by just nudging that 𝑥-value by some little d𝑥. And I always think that it’s helpful to think of 𝑥 as starting at some actual concrete number, maybe 1.5 in this case. The resulting nudge to that second value, the change in 𝑥 squared caused by such a d𝑥, is d𝑥 squared. And we could expand this, like we have before, as two 𝑥 times d𝑥, which for our specific input would be two times 1.5 times d𝑥. But it actually helps to keep things written as d𝑥 squared, at least for now. And in fact, I’m gonna go one step further. I’m gonna give a new name to this 𝑥 squared, maybe ℎ. So that instead of writing d𝑥 squared for this nudge, we write dℎ.

And this makes it easier to think about that third value, which is now pegged at sin of ℎ. Its change is dsin of ℎ, the tiny change caused by the nudge dℎ. And by the way, the fact that it’s moving to the left while the dℎ bump is going to the right, that just means that this change, dsin of ℎ, is gonna be some kind of negative number. And once again, we can use our knowledge of the derivative of the sine. This dsin of ℎ is gonna be about cos of ℎ times dℎ. That’s what it means for the derivative of sine to be cosine. And unfolding things, we can just replace that ℎ with 𝑥 squared again. So we know that that bottom nudge is gonna have a size of cos of 𝑥 squared times d𝑥 squared. And in fact, let’s unfold things even further. That intermediate nudge, d𝑥 squared, is gonna be about two 𝑥 times d𝑥.

And it’s always a good habit to remind yourself of what an expression like this actually means. In this case, where we started at 𝑥 equals 1.5 up top. This whole expression is telling us that the size of the nudge on that third line is gonna be about cos of 1.5 squared times two times 1.5 times whatever the size of d𝑥 was. It’s proportional to the size of d𝑥. And this derivative is giving us that proportionality constant.

Notice what we came out with here. We have the derivative of the outside function. And it’s still taking in the unaltered inside function. And then we’re multiplying it by the derivative of that inside function. Again, there’s nothing special about sin of 𝑥 or 𝑥 squared. If you have any two functions, 𝑔 of 𝑥 and ℎ of 𝑥, the derivative of their composition, 𝑔 of ℎ of 𝑥, is gonna be the derivative of 𝑔 evaluated on ℎ multiplied by the derivative of ℎ. This pattern right here is what we usually call the chain rule. Notice, for the derivative of 𝑔, I’m writing it as d𝑔 dℎ instead of d𝑔 d𝑥. On the symbolic level, this is a reminder that the thing you plug in to that derivative is still gonna be that intermediary function, ℎ. But more than that, it’s an important reflection of what this derivative of the outer function actually represents.

Remember, in our three-line set-up, when we took the derivative of the sine on that bottom, we expanded the size of that nudge, dsin, as cos of ℎ times dℎ. This was because we didn’t immediately know how the size of that bottom nudge depended on 𝑥. That’s kinda the whole thing we were trying to figure out. But we could take the derivative with respect to that intermediate variable, ℎ. That is, figure out how to express the size of that nudge on the third line as some multiple of dℎ, the size of the nudge on the second line. And it was only after that that we unfolded further by figuring out what dℎ was.

So in this chain rule expression, we’re saying look at the ratio between a tiny change in 𝑔, the final output, to a tiny change in ℎ that caused it. ℎ being the value that we plug in to 𝑔. Then multiply that by the tiny change in ℎ divided by the tiny change in 𝑥 that caused it. So notice, those dℎs cancel out, and they give us a ratio between the change in that final output and the change to the input that, through a certain chain of events, brought it out. And that cancellation of dℎ is not just a notational trick. That is a genuine reflection of what’s going on with the tiny nudges that underpin everything we do with derivatives.

So those are the three basic tools to have in your belt to handle derivatives of functions that combine a lot of smaller things. You’ve got the sum rule, the product rule, and the chain rule. And I’ll be honest with you. There is a big difference between knowing what the chain rule is and what the product rule is and actually being fluent with applying them in even the most hairy of situations. Watching videos, any videos, about the mechanics of calculus is never gonna substitute for practicing those mechanics yourself and building up the muscles to do these computations yourself.

I really wish that I could offer to do that for you, but I’m afraid the ball is in your court, my friend, to seek out the practice. What I can offer, and what I hope I have offered, is to show you where these rules actually come from. To show that they’re not just something to be memorized and hammered away. But they’re natural patterns, things that you too could have discovered just by patiently thinking through what a derivative actually means.