Picture yourself as an early calculus student about to begin your first course. The months ahead of you hold within them a lot of hard work, some neat examples, some
not-so-neat examples. Beautiful connections to physics, not-so-beautiful piles of formulas to memorize. Plenty of moments of getting stuck and banging your head into a wall, a few nice
“Aha” moments sprinkled in as well. And some genuinely lovely graphical intuition to help guide you through it all.
But if the course ahead of you is anything like my first introduction to calculus or
any of the first courses that I’ve seen in the years since. There’s one topic that you will not see, but which I believe stands to greatly
accelerate your learning. You see, almost all of the visual intuitions from that first year are based on
graphs. The derivative is the slope of a graph. The integral is a certain area under that graph. But as you generalize calculus beyond functions whose inputs and outputs are simply
numbers, it’s not always possible to graph the function that you’re analyzing. There’s all sorts of different ways that you’d be visualizing these things.
So if all your intuitions for the fundamental ideas, like derivatives, are rooted too
rigidly in graphs. It can make for a very tall and largely unnecessary conceptual hurdle between you and
the more quote, unquote, advanced topics like multivariable calculus and complex
analysis, differential geometry. Now, what I wanna share with you is a way to think about derivatives, which I’ll
refer to as the transformational view, that generalizes more seamlessly into some of
those more general contexts where calculus comes up. And then we’ll use this alternate view to analyze a certain fun puzzle about repeated
But first off, I just wanna make sure that we’re all on the same page about what the
standard visual is. If you were to graph a function, which simply takes real numbers as inputs and
outputs. One of the first things you learn in a calculus course is that the derivative gives
you the slope of this graph. Where what we mean by that is that the derivative of the function is a new function
which for every input 𝑥 returns that slope. Now, I’d encourage you not to think of this derivative-as-slope idea as being the
definition of a derivative. Instead, think of it as being more fundamentally about how sensitive the function is
to tiny little nudges around the input. And the slope is just one way to think about that sensitivity relevant only to this
particular way of viewing functions. I have not just another video, but a full series on this topic if it’s something you
wanna learn more about.
Now the basic idea behind the alternate visual for the derivative is to think of this
function as mapping all of the input points on the number line to their
corresponding outputs on a different number line. In this context, what the derivative gives you is a measure of how much the input
space gets stretched or squished in various regions. That is, if you were to zoom in around a specific input and take a look at some
evenly spaced points around it, the derivative of the function of that input is
gonna tell you how spread out or contracted those points become after the
Here, a specific example helps. Take the function 𝑥 squared. It maps one to one and two to four, three to nine, and so on. And you could also see how it acts on all of the points in between. And if you were to zoom in on a little cluster of points around the input one and
then see where they land around the relevant output, which for this function also
happens to be one. You’d notice that they tend to get stretched out. In fact, it roughly looks like stretching out by a factor of two. And the closer you zoom in, the more this local behavior looks just like multiplying
by a factor of two. This is what it means for the derivative of 𝑥 squared at the input 𝑥 equals one to
be two. It’s what that fact looks like in the context of transformations.
If you looked at a neighborhood of points around the input three, they would get
roughly stretched out by a factor of six. This is what it means for the derivative of this function at the input three to equal
six. Around the input one-fourth, a small region actually tends to get contracted,
specifically by a factor of one-half. And that’s what it looks like for a derivative to be smaller than one. Now the input zero is interesting. Zooming in by a factor of 10, it doesn’t really look like a constant stretching or
squishing. For one thing, all of the outputs end up on the right positive side of things. And as you zoom in closer and closer by 100𝑥 or by 1000𝑥, it looks more and more
like a small neighborhood of points around zero, just gets collapsed into zero
And this is what it looks like for the derivative to be zero. The local behavior looks more and more like multiplying the whole number line by
zero. It doesn’t have to completely collapse everything to a point at a particular zoom
level. Instead, it’s a matter of what the limiting behavior is as you zoom in closer and
closer. It’s also instructive to take a look at the negative inputs here. Things start to feel a little cramped since they collide with where all the positive
input values go. And this is one of the downsides of thinking of functions as transformations. But for derivatives, we only really care about the local behavior anyway, what
happens in a small range around a given input.
Here, notice that the inputs in a little neighborhood around, say, negative two. They don’t just get stretched out. They also get flipped around. Specifically, the action on such a neighborhood looks more and more like multiplying
by negative four, the closer you zoom in. This is what it looks like for the derivative of a function to be negative. And I think you get the point. This is all well and good, but let’s see how this is actually useful in solving a
problem, A friend of mine recently asked me a pretty fun question about the infinite
fraction one plus one divided by one plus one divided by one plus one divided by
one, on and on and on and on. And clearly, you watch math videos online. So maybe you’ve seen this before.
But my friend’s question actually cuts to something that you might not have thought
about before, relevant to the view of derivatives that we’re looking at here. The typical way that you might evaluate an expression like this is to set it equal to
𝑥 and then notice that there’s a copy of the full fraction inside itself. So you can replace that copy with another 𝑥, and then just solve for 𝑥. That is, what you want is to find a fixed point of the function one plus one divided
by 𝑥. But here’s the thing. There are actually two solutions for 𝑥, two special numbers where one plus one
divided by that number gives you back the same thing. One is the golden ratio 𝜑, around 1.618. And the other is negative 0.618, which happens to be negative one divided by 𝜑. I like to call this other number 𝜑’s little brother. Since just about any property that 𝜑 has, this number also has.
And this raises the question, would it be valid to say that that infinite fraction
that we saw is somehow also equal to 𝜑’s little brother, negative 0.618? Maybe you initially say, “Obviously not! Everything on the left hand side is positive. So how could it possibly equal a negative number?” Well, first we should be clear about what we actually mean by an expression like
this. One way that you could think about it — and it’s not the only way; there’s freedom
for choice here — is to imagine starting with some constant like one and then
repeatedly applying the function one plus one divided by 𝑥. And then asking what is this approach, as you keep going. And certainly, symbolically, what you get looks more and more like our infinite
fraction. So maybe if you wanted to equal a number, you should ask what this series of numbers
And if that’s your view of things, maybe you start off with a negative number. So it’s not so crazy for the whole expression to end up negative. After all, if you start with negative one divided by 𝜑, then applying this function
one plus one over 𝑥, you get back the same number, negative one divided by 𝜑. So no matter how many times you apply it, you’re staying fixed at this value. But even then, there is one reason that you should probably view 𝜑 as the favorite
brother in this pair.
Here, try this. Pull up a calculator of some kind, then start with any random number. And then plug it into this function, one plus one divided by 𝑥. And then plug that number into one plus one over 𝑥. And then again and again and again and again and again. No matter what constant you start with, you eventually end up at 1.618. Even if you start with a negative number, even one that’s really, really close to
𝜑’s little brother. Eventually, it shies away from that value and jumps back over to 𝜑. So what’s going on here? Why is one of these fixed points favored above the other one? Maybe you can already see how the transformational understanding of derivatives is
gonna be helpful for understanding this set-up. But for the sake of having a point of contrast, I wanna show you how a problem like
this is often taught using graphs.
If you were to plug in some random input to this function, the 𝑦-value tells you the
corresponding output, right? So to think about plugging that output back into the function, you might first move
horizontally until you hit the line 𝑦 equals 𝑥. And that’s gonna give you a position where the 𝑥-value corresponds to your previous
𝑦-value, right? So then from there, you can move vertically to see what output this new 𝑥-value
has. And then you repeat. You move horizontally to the line 𝑦 equals 𝑥 to find a point whose 𝑥-value is the
same as the output that you just got. And then you move vertically to apply the function again.
Now personally, I think this is kind of an awkward way to think about repeatedly
applying a function, don’t you? I mean it makes sense, but you kinda have to pause and think about it to remember
which way to draw the lines. And you can, if you want, think through what conditions make this spiderweb process
narrow in on a fixed point versus propagating away from it. And in fact, go ahead! Pause right now and try to think it through as an exercise. It has to do with slopes. Or if you wanna skip the exercise for something that I think gives a much more
satisfying understanding, think about how this function acts as a
So I’m gonna go ahead and start here by drawing a whole bunch of arrows to indicate
where the various sampled input points will go. And side note, don’t you think this gives a really neat emergent pattern? I wasn’t expecting this, but it was cool to see it pop up when animating. I guess the action of one divided by 𝑥 gives this nice emergent circle. And then we’re just shifting things over by one. Anyway, I want you to think about what it means to repeatedly apply some function,
like one plus one over 𝑥, in this context. Well, after letting it map all of the inputs to the outputs, you could consider those
as the new inputs. And then just apply the same process again, and then again. And do it however many times you want.
Notice, in animating this with a few dots representing the sample points, it doesn’t
take many iterations at all before all of those dots kind of clump in around
1.618. Now remember, we know that 1.618, and its little brother, negative 0.618 on and on,
stay fixed in place during each iteration of this process. But zoom in on a neighborhood around 𝜑. During the map, points in that region get contracted around 𝜑. Meaning that the function one plus one over 𝑥 has a derivative with a magnitude
that’s less than one at this input. In fact, this derivative works out to be around negative 0.38. So what that means is that each repeated application scrunches the neighborhood
around this number smaller and smaller, like a gravitational pull towards 𝜑. So now, tell me what you think happens in the neighborhood of 𝜑’s little
Over there, the derivative actually has a magnitude larger than one. So points near the fixed point are repelled away from it. And when you work it out, you can see that they get stretched by more than a factor
of two in each iteration. They also get flipped around because the derivative is negative here. But the salient fact for the sake of stability is just the magnitude. Mathematicians would call this right value a stable fixed point, and the left one is
an unstable fixed point. Something is considered stable if when you perturb it just a little bit, it tends to
come back towards where it started rather than going away from it. So what we’re seeing is a very useful little fact, that the stability of a fixed
point is determined by whether or not the magnitude of its derivative is bigger or
smaller than one.
And this explains why 𝜑 always shows up in the numerical play where you’re just
hitting enter on your calculator over and over, but 𝜑’s little brother never
does. Now as to whether or not you wanna consider 𝜑’s little brother a valid value of the
infinite fraction, well, that’s really up to you. Everything we just showed suggests that if you think of this expression as
representing a limiting process. Then because every possible seed value other than 𝜑’s little brother gives you a
series converting to 𝜑, it does feel kinda silly to put them on equal footing with
each other. But maybe you don’t think of it as a limit. Maybe the kind of math you’re doing lends itself to treating this as a purely
algebraic object, like the solutions of a polynomial which simply has multiple
values. Anyway, that’s beside the point.
And my point here is not that viewing derivatives as this change in density is
somehow better than the graphical intuition on the whole. In fact, picturing an entire function this way can be kind of clunky and impractical
as compared to graphs. My point is that it deserves more of a mention in most of the introductory calculus
courses. Because it can help make a student’s understanding of the derivative a little bit
more flexible. Like I mentioned, the real reason that I’d recommend you carry this perspective with
you as you learn new topics is not so much for what it does with your understanding
of single variable calculus, it’s for what comes after.
There are many topics typically taught in a college math department which — how shall
I put this lightly? — don’t exactly have a reputation of being super accessible. So in the next video, I’m gonna show you how a few ideas from these subjects with
fancy sounding names, like holomorphic functions and the Jacobian determinant, are
really just extensions of the idea shown here. They really are some beautiful ideas, which I think can be appreciated from a really
wide range of mathematical backgrounds. And they’re relevant to a surprising number of seemingly unrelated ideas. So stay tuned for that.