This guy, Grothendieck, is somewhat of a mathematical idol to me. And I just love this quote, don’t you? Too often in math, we dive into showing that a certain fact is true with a long
series of formulas before stepping back and making sure that it feels reasonable and
preferably obvious, at least at an intuitive level. In this video, I wanna talk about integrals. And the thing that I want to become almost obvious is that they are an inverse of
Here, we’re just gonna focus on one example, which is a kind of dual to the example
of a moving car that I talked about in chapter two of the series, introducing
derivatives. Then, in the next video, we’re gonna see how this same idea generalizes but to a
couple other contents.
Imagine that you’re sitting in a car and you can’t see out the window. All you see is the speedometer. At some point, the car starts moving, speeds up, and then slows back down to a stop,
all over the course of eight seconds. The question is, is there a nice way to figure out how far you’ve travelled during
that time, based only on your view of the speedometer? Or, better yet, can you find a distance function, 𝑠 of 𝑡, that tells you how far
you’ve travelled after a given amount of time, 𝑡, somewhere between zero and eight
Let’s say that you take note of the velocity at every second. And you make a plot over time that looks something like this. And maybe you find that a nice function to model that velocity over time, in meters
per second, is 𝑣 of 𝑡 equals 𝑡 times eight minus 𝑡. You might remember, in chapter two of this series, we were looking at the opposite
situation, where you knew what a distance function was, 𝑠 of 𝑡, and you wanted to
figure out the velocity function from that. There, I showed how the derivative of a distance-verses-time function gives you a
So in our current situation, where all we know is velocity, it should make sense that
finding a distance-versus-time function is gonna come down to asking what function
has a derivative of 𝑡 times eight minus 𝑡. This is often described as finding the antiderivative of a function. And indeed, that’s what we’ll end up doing. And you could even pause right now and try that. But first, I wanna spend the bulk of this video showing how this question is related
to finding the area bounded by the velocity graph. Because that helps to build an intuition for a whole class of problems, things called
integral problems in math and science.
To start off, notice that this question would be a lot easier if the car was just
moving at a constant velocity, right? In that case, you could just multiply the velocity, in meters per second, times the
amount of time that has passed, in seconds. And that would give you the number of meters travelled. And notice, you can visualize that product, that distance, as an area. And if visualizing distance as area seems kinda weird, I’m right there with you. It’s just that on this plot, where the horizontal direction has units of seconds and
the vertical direction has units of meters per second, units of area just very
naturally correspond to meters.
But what makes our situation hard is that velocity is not constant. It’s incessantly changing at every single instant. It would even be a lot easier if it only ever changed at a handful of points. Maybe staying static for the first second and then suddenly discontinuously jumping
to a constant seven meters per second for the next second and so on, with
discontinuous jumps to portions of constant velocity. That would make it uncomfortable for the driver. In fact, it’s actually physically impossible. But it would make your calculations a lot more straightforward. You could just compute the distance travelled on each interval by multiplying the
constant velocity on that interval by the change in time. And then, just add all of those up.
So what we’re gonna do is approximate the velocity function as if it was constant on
a bunch of intervals. And then, as it’s common in calculus, we’ll see how refining that approximation leads
us to something more precise. Here, let’s make this a little more concrete by throwing in some numbers. Chop up the time axis between zero and eight seconds into many small intervals, each
with some little width d𝑡, something like 0.25 seconds. Now, consider one of those intervals, like the one between 𝑡 equals one and
1.25. In reality, the car speeds up from seven meters per second to about 8.4 meters per
second during that time. And you could find those numbers just by plugging in 𝑡 equals one and 𝑡 equals 1.25
to the equation for velocity.
What we wanna do is approximate the car’s motion as if its velocity was constant on
that interval. Again, the reason for doing that is we just don’t really know how to handle
situations other than constant velocity ones. You could choose this constant to be anything between seven and 8.4. It actually doesn’t matter. All that matters is that our sequence of approximations, whatever they are, gets
better and better as d𝑡 gets smaller and smaller. The treat in this car’s journey as a bunch of discontinuous jumps in speed between
portions of constant velocity becomes a less wrong reflection of reality as we
decrease the time between those jumps.
So, for convenience, on an interval like this, let’s just approximate the speed with
whatever the true car’s velocity is at the start of that interval, the height of the
graph above the left side, which in this case is seven. So, in this example interval, according to our approximation, the car moves seven
meters per second times 0.25 seconds. That’s 1.75 meters, and it’s nicely visualized as the area of this thin
rectangle. In truth, that’s a little under the real distance travelled but not by much. And the same goes for every other interval. The approximated distance is 𝑣 of 𝑡 times d𝑡. It’s just that you’d be plugging in a different value for 𝑡 at each one of these,
giving a different height for each rectangle.
I’m gonna write out an expression for the sum of the areas of all those rectangles in
kind of a funny way. Take this symbol here which looks like a stretched s for some. And then, put a zero at its bottom and an eight at its top, to indicate that we’ll be
ranging over time steps between zero and eight seconds. And as I said, the amount we’re adding up at each time step is 𝑣 of 𝑡 times
d𝑡. Two things are implicit in this notation. First of all, that value d𝑡 plays two separate roles. Not only is it a factor in each quantity that we’re adding up, it also indicates the
spacing between each sample time step. So when you make d𝑡 smaller and smaller, even though it decreases the area of each
rectangle, it increases the total number of rectangles whose areas we’re adding
up. Because if they’re thinner, it takes more of them to fill that space.
And second, the reason we don’t use the usual sigma notation that indicate a sum is
that this expression is technically not any particular sum for any particular choice
of d𝑡. It’s meant to express whatever that sum approaches as d𝑡 approaches zero. And as you can see, what that approaches is the area bounded by this curve and the
horizontal axis. Remember, smaller choices of d𝑡 indicate closer approximations for the original
question. How far does the car actually go? So this limiting value for the sum, the area under this curve, gives us the precise
answer to the question in full unapproximated precision.
Now, tell me that’s not surprising. We had this pretty complicated idea of approximations that can involve adding up a
huge number of very tiny things. And yet, the value that those approximations approach can be described so simply. It’s just the area underneath this curve. This expression is called an integral of 𝑣 of 𝑡, since it brings all if its values
together. It integrates them. Now at this point, you could say, how does this help? You’ve just reframed one hard question finding how far the car has traveled into an
equally hard problem, finding the area between this graph and the horizontal
axis. And you’d be right. If the velocity distance duo was the only thing that we cared about, most of this
video, with all of the area under a curve nonsense, would be a waste of time. We could just skip straight ahead to finding an antiderivative.
But finding the area between a function’s graph and the horizontal axis is somewhat
of a common language from many disparate problems that can be broken down and
approximated as the sum of a large number of small things. You’ll see more in the next video. But for now, I’ll just say in the abstract that understanding how to interpret and
how to compute the area under a graph is a very general problem-solving tool. In fact, the first video of this series already covered the basics of how this
works. But now that we have more of a background with derivatives, we can actually take this
idea to its completion.
For a velocity example, think of this right endpoint as a variable, capital 𝑇. So we’re thinking of this integral of the velocity function between zero and 𝑇, the
area under this curve between those inputs, as a function, where the upper bound is
the variable. That area represents the distance the car has traveled after 𝑇 seconds, right? So in reality, this is a distance-versus-time function, 𝑠 of 𝑇. Now, ask yourself, what is the derivative of that function? On the one hand, a tiny change in distance over a tiny change in time. That’s velocity; that is what velocity means.
But there’s another way to see this, purely in terms of this graph and this area,
which generalizes a lot better to other integral problems. A slight nudge of d𝑇 to the input causes that area to increase, some little d𝑠
represented by the area of this sliver. The height of that sliver is the height of the graph at that point, 𝑣 of 𝑇. And its width is d𝑇. And for small enough d𝑇, we can basically consider that sliver to be a
rectangle. So this little bit of added area, d𝑠, is approximately equal to 𝑣 of 𝑇 times
d𝑇. And because that’s an approximation, it gets better and better for smaller d𝑇. The derivative of that area function d𝑠 d𝑇 at this point equals 𝑣𝑇, the value of
the velocity function at whatever time we started on.
And that right there, that’s a super general argument. The derivative of any function, given the area under a graph like this, is equal to
the function for the graph itself. So if our velocity function is 𝑡 times eight minus 𝑡, what should 𝑠 be? What function of 𝑡 has a derivative of 𝑡 times eight minus 𝑡? It’s easier to see if we expand this out, writing it as eight 𝑡 minus 𝑡
squared. And then we can just take each part one at a time. What function has a derivative of eight times 𝑡? Well, we know that the derivative of 𝑡 squared is two 𝑡. So if we just scale that up by a factor of four, we can see that the derivative of
four 𝑡 squared is eight 𝑡. And for that second part, what kind of function do you think might have negative 𝑡
squared as a derivative?
Well, using the power rule again, we know that the derivative of a cubic term, 𝑡
cubed, gives us a square term, three 𝑡 squared. So if we just scale that down by a third, the derivative of one-third 𝑡 cubed is
exactly 𝑡 squared. And then making that negative, we’d see that negative one-third 𝑡 cubed has a
derivative of negative 𝑡 squared. Therefore, the antiderivative of our function, eight 𝑡 minus 𝑡 squared, is four 𝑡
squared minus one-third 𝑡 cubed. But there’s a slight issue here. We could add any constant we want to this function. And its derivative is still gonna be eight 𝑡 minus 𝑡 squared. The derivative of a constant just always goes to zero.
And if you were to graph 𝑠 of 𝑡, you could think of this in the sense that moving a
graph of a distance function up and down does nothing to affect its slope at every
input. So in reality, there’s actually infinitely many different possible antiderivative
functions. And every one of them looks like four 𝑡 squared minus one-third 𝑡 cubed plus 𝐶,
for some constant 𝐶. But there is one piece of information that we haven’t used yet that’s gonna let us
zero in on which antiderivative to use, the lower bound of the integral. This integral has to be zero when we drag that right endpoint all the way to the left
endpoint, right? The distance travelled by the car between zero seconds and zero seconds is, well,
So as we found, the area as a function of capital 𝑇 is an antiderivative for the
stuff inside. And to choose what constant to add to this expression, what you do is subtract off
the value of that antiderivative function at the lower bound. If you think about it for a moment, that ensures that the integral from the lower
bound to itself will indeed be zero. As it so happens, when you evaluate the function we have right here, at 𝑇 equals
zero, you get zero. So in this specific case, you actually don’t need to subtract anything off.
For example, the total distance travelled during the full eight seconds is this
expression evaluated at 𝑇 equals eight, which is about 85.33, minus zero. So the answer as a whole is just 85.33. But a more typical example would be something like the integral between one and
seven. That’s the area pictured right here. And it represents the distance travelled between one second and seven seconds. What you do is evaluate the antiderivative we found at the top bound, seven, and then
subtract off its value at that bottom bound, one. And notice by the way, it doesn’t matter which antiderivative we chose here. If for some reason it had a constant added to it, like five, that constant would just
More generally, anytime you want to integrate some function, then remember, you think
of that as adding up values 𝑓 of 𝑥 times d𝑥 for inputs in a certain range, and
then asking what does that sum approach as d𝑥 approaches zero. The first step to evaluating that integral is to find an antiderivative, some other
function, capital 𝐹, whose derivative is the thing inside the integral. Then the integral equals this antiderivative evaluated at the top bound minus its
value at the bottom bound. And this fact, right here that you’re staring at, is the fundamental theorem of
And I want you to appreciate something kinda crazy about this fact. The integral, the limiting value for the sum of all of these thin rectangles, takes
into account every single input on the continuum, from the lower bound to the upper
bound. That’s why we use the word integrate; it brings them all together. And yet, to actually compute it using an antiderivative, you only look at two inputs,
the top bound and the bottom bound. It almost feels like cheating. Finding the antiderivative implicitly accounts for all of the information needed to
add up the values between those two bounds. That’s just crazy to me. This idea is deep. And there’s a lot packed into this whole concept. So let’s just recap everything that just happened, shall we?
We wanted to figure out how far a car goes just by looking at the speedometer. And what makes that hard is that velocity is always changing. If you approximate velocity to be constant on multiple different intervals, you could
figure out how far the car goes on each interval, just with multiplication. And then, add all of those up. Better and better approximations for the original problem correspond to collections
of rectangles whose aggregate area is closer and closer to being the area under this
curve between the start time and the end time. So that area under the curve is actually the precise distance travelled for the true
nowhere constant-velocity function.
If you think of that area as a function itself with a variable right endpoint, you
can deduce that the derivative of that area function must equal the height of the
graph at every point. And that’s really the key right there. It means that to find a function given this area, you ask, what function has 𝑣 of 𝑡
as a derivative? There are actually infinitely many antiderivatives of a given function, since you can
always just add some constant without affecting the derivative. So you account for that by subtracting off the value of whatever antiderivative
function you choose at the bottom bound.
By the way, one important thing to bring up before we leave is the idea of negative
area. What if the velocity function was negative at some point? Meaning, the car goes backwards. It’s still true that a tiny distance travelled, d𝑠, on a little time interval is
about equal to the velocity at that time multiplied by the tiny change in time. It’s just that the number you’d plug in for velocity would be negative. So the tiny change in distance is negative. In terms of our thin rectangles, if a rectangle goes below the horizontal axis like
this, its area represents a bit of distance travelled backwards.
So if what you want in the end is to find the distance between the car’s start point
and its endpoint, this is something you’re gonna wanna subtract. And that’s generally true of integrals. Whenever our graph dips below the horizontal axis, the area between that portion of
the graph and the horizontal axis is counted as negative. And what you’ll commonly hear is that integrals don’t measure area per say. They measure the signed area between the graph and the horizontal axis.