Video Transcript
Here, I wanna discuss one common type of problem where integration comes up: finding
the average of a continuous variable. This is a perfectly useful thing to know in its own, right? But what’s really neat is that it can give us a completely different perspective for
why integrals and derivatives are inverses of each other.
To start, take a look at the graph of sin of 𝑥 between zero and 𝜋, which is half of
its period. What is the average height of this graph on that interval? It’s not a useless question. All sorts of cyclic phenomena in the world are modeled using sine waves. For example, the number of hours that the sun is up per day as a function of what day
of the year it is follows a sine-wave pattern. So if you wanted to predict, say, the average effectiveness of solar panels in summer
months versus winter months, you’d wanna be able to answer a question like this. What is the average value of that sine function over half of its period? Whereas a case like this is gonna have all sorts of constants mucking up the
function, you and I are just gonna focus on a pure unencumbered sin of 𝑥
function. But the substance of the approach would be totally the same in any other
application.
It’s kind of a weird question to think about though, isn’t it? The average of a continuous variable. Usually, with averages, we think of a finite number of variables, where you can add
them all up and divide that sum by how many there are. But there are infinitely many values of sin of 𝑥 between zero and 𝜋. And it’s not like we can just add up all of those numbers and divide by infinity. Now, this sensation actually comes up a lot in math, and it’s worth remembering,
where you have this vague sense that what you wanna do is add together infinitely
many values associated with a continuum, even though that doesn’t really make
sense.
And almost always, when you get that sense, the key is gonna be to use an integral
somehow. And to think through exactly how, a good first step is usually to just approximate
your situation with some kind of finite sum. In this case, imagine sampling a finite number of points, evenly spaced along this
range. Since it’s a finite sample, you can find the average by just adding up all of the
heights, sin of 𝑥 at each one of these, and then dividing that sum by the number of
points that you sampled, right? And presumably, if the idea of an average height among all infinitely many points is
gonna make any sense at all, the more points we sample, which would involve adding
up more and more heights, the closer the average of that sample should be to the
actual average of the continuous variable.
And this should feel at least somewhat related to taking an integral of sin of 𝑥
between zero and 𝜋, even if it might not be exactly clear how the two ideas match
up. For that integral, remember, you also think of a sample of inputs on this
continuum. But instead of adding the height, sin of 𝑥 at each one, and dividing by how many
there are, you add up sin of 𝑥 times d𝑥, where d𝑥 is the spacing between the
samples. That is, you’re adding up little areas, not heights. And technically, the integral is not quite this sum. It’s whatever that sum approaches as d𝑥 approaches zero. But it is actually quite helpful to reason with respect to one of these finite
iterations, where we’re looking at a concrete size for d𝑥 and some specific number
of rectangles.
So what you wanna do here is reframe this expression for the average, this sum of the
heights divided by the number of sampled points, in terms of d𝑥, the spacing
between samples. And now, if I tell you that the spacing between these points is, say, 0.1 and you
know that they range from zero to 𝜋, can you tell me how many there are? Well, you can take the length of that interval, 𝜋, and divide it by the length of
the space between each sample. If it doesn’t go in perfectly evenly, you would have to round down to the nearest
integer. But as an approximation, this is completely fine. So if we write that spacing between samples as d𝑥, the number of samples is 𝜋
divided by d𝑥. And when we substitute that into our expression up here, you can rearrange it,
putting that d𝑥 up top and distributing it into the sum.
But think about what it means to distribute that d𝑥 up top. It means that the terms you’re adding up will look like sin of 𝑥 times d𝑥 for the
various inputs 𝑥 that you’re sampling. So that numerator looks exactly like an integral expression. And so for larger and larger samples of points, this average will approach the actual
integral of sin of 𝑥 between zero and 𝜋, all divided by the length of that
interval, 𝜋. In other words, the average height of this graph is this area divided by its
width. On an intuitive level, and just thinking in terms of units, that feels pretty
reasonable, doesn’t it? Area divided by width gives you an average height.
So with this expression in hand, let’s actually solve it. As we saw last video, to compute an integral, you need to find an antiderivative of
the function inside the integral, some other function whose derivative is sin of
𝑥. And if you’re comfortable with derivatives of trig functions, you know that the
derivative of cosine is negative sine. So if you just negate that, negative cosine is the function we want, the
antiderivative of sine. And to go check yourself on that, look at this graph of negative cosine. At zero, the slope is zero. And then it increases up to some maximum slope at 𝜋 halves and then goes back down
to zero at 𝜋. And in general, its slope does indeed seem to match the height of the sine graph at
every point.
So what do we have to do to evaluate the integral of sine between 0 and 𝜋? Well, we evaluate this antiderivative at the upper bound and subtract off its value
at the lower bound. More visually, that is the difference in the height of this negative cosine graph
above 𝜋 and its height at zero. And as you can see, that change in height is exactly two. That’s kind of interesting, isn’t it? That the area under this sine graph turns out to be exactly two. So the answer to our average height problem, this integral divided by the width of
the region, evidently turns out to be two divided by 𝜋, which is around 0.64.
I promised at the start that this question of finding the average of a function
offers an alternate perspective on why integrals and derivatives are inverses of
each other, why the area under one graph has anything to do with the slope of
another graph. Notice how finding this average value, two divided by 𝜋, came down to looking at the
change in the antiderivative negative cos 𝑥 over the input range divided by the
length of that range. And another way to think about that fraction is as the rise-over-run slope between
the point of the antiderivative graph below zero and the point of that graph above
𝜋. And now think about why it might make sense that this slope would represent an
average value of sin of 𝑥 on that region.
Well, by definition, sin of 𝑥 is the derivative of this antiderivative graph. It gives us the slope of negative cosine at every point. So another way to think about the average value of sin of 𝑥 is as the average slope
over all tangent lines here between zero and 𝜋. And when you view things like that, doesn’t it make a lot of sense that the average
slope of a graph over all of its points in a certain range should equal the total
slope between the start and end points?
To digest this idea, it helps to think about what it looks like for a general
function. For any function 𝑓 of 𝑥, if you wanna find its average value on some interval, say
between 𝑎 and 𝑏, what you do is take the integral of 𝑓 on that interval, divide
it by the width of that interval, 𝑏 minus 𝑎. You can think of this as the area under the graph divided by its width. Or more accurately, it is the signed area of that graph, since any area below the
𝑥-axis is counted as negative. And it’s worth taking a moment to remember what this area has to do with the usual
notion of a finite average, where you add up many numbers and divide by how many
there are.
When you take some sample of points spaced out by d𝑥, the number of samples is about
equal to the length of the interval divided by d𝑥. So if you add up the values of 𝑓 of 𝑥 at each sample and divide by the total number
of samples, it’s the same as adding up the product, 𝑓 of 𝑥 times d𝑥, and dividing
by the width of the entire interval. The only difference between that and the integral is that the integral asks what
happens as d𝑥 approaches zero. But that just corresponds with samples of more and more points that approximate the
true average increasingly well.
Now, for any integral, evaluating it comes down to finding an antiderivative of 𝑓 of
𝑥, commonly denoted capital 𝐹 of 𝑥. What we want is the change to this antiderivative between 𝑎 and 𝑏, capital 𝐹 of 𝑏
minus capital 𝐹 of 𝑎, which you can think of as the change in height of this new
graph between the two bounds. I’ve conveniently chosen an antiderivative that passes through zero at the lower
bound here. But keep in mind, you can freely shift this up and down, adding whatever constant you
want to it. And it would still be a valid antiderivative. So the solution to the average problem is the change in the height of this new graph
divided by the change to the 𝑥-value between 𝑎 and 𝑏. In other words, it is the slope of the antiderivative graph between the two
endpoints.
And again, when you stop to think about it, that should make a lot of sense because
little 𝑓 of 𝑥 gives us the slope of the tangent line to this graph at each
point. After all, it is by definition the derivative of capital 𝐹. So, why are antiderivatives the key to solving integrals? Well, my favorite intuition is still the one that I showed last video. But a second perspective is that when you reframe the question of finding an average
of a continuous value as, instead, finding the average slope of bunch of tangent
lines, it lets you see the answer just by comparing endpoints, rather than having to
actually tally up all of the points in between.
In the last video, I described the sensation that should bring integrals to your
mind. Namely, if you feel like the problem you’re solving could be approximated by breaking
it up somehow and adding up a large number of small things. And here, I want you to come away recognizing a second sensation that should also
bring integrals to your mind. If ever there’s some idea that you understand in a finite context and which involves
adding up multiple values, like taking the average of a bunch of numbers, and if you
wanna generalize that idea to apply to an infinite continuous range of values, try
seeing if you can phase things in terms of an integral. It’s a feeling that comes up all the time, especially in probability. And it’s definitely worth remembering.