Video Transcript
I’d like to revisit a deceptively
simple question that I asked in the very first video of this series, what are
vectors? Is a two-dimensional vector, for
example, fundamentally an arrow on a flat plane that we can describe with
coordinates for convenience, or is it fundamentally that pair of real numbers which
is just nicely visualised as an arrow on a flat plane, or are both of these just
manifestations of something deeper?
On the one hand, defining vectors
as primarily being a list of numbers feels clear-cut and unambiguous. It makes things like
four-dimensional vectors or one-hundred-dimensional vectors sound like real,
concrete ideas that you can actually work with. When otherwise, an idea like four
dimensions is just a vague, geometric notion that’s difficult to describe without
waving your hands a bit. But on the other hand, a common
sensation for those who actually work with linear algebra, especially as you get
more fluent with changing your basis, is that you’re dealing with a space that
exists independently from the coordinates that you give it and that coordinates are
actually somewhat arbitrary, depending on what you happen to choose as your basis
vectors.
Core topics in linear algebra, like
determinants and eigenvectors, seem indifferent to your choice of coordinate
systems. The determinant tells you how much
a transformation scales areas, and eigenvectors are the ones that stay on their own
span during a transformation, but both of these properties are inherently spatial,
and you can freely change your coordinate system without changing the underlying
values of either one. But, if vectors are not
fundamentally lists of real numbers, and if their underlying essence is something
more spatial, that just begs the question of what mathematicians mean when they use
a word like space or spatial.
To build up to where this is going,
I’d actually like to spend the bulk of this video talking about something which is
neither an arrow nor a list of numbers but also has vector-ish qualities,
functions. You see, there’s a sense in which
functions are actually just another type of vector. In the same way that you can add
two vectors together, there’s also a sensible notion for adding two functions, 𝑓
and 𝑔, to get a new function, 𝑓 plus 𝑔. It’s one of those thing where you
kinda already know what it’s gonna be, but actually phrasing it is a mouthful. The output of this new function at
any given input, like negative four, is the sum of the outputs of 𝑓 and 𝑔, when
you evaluate them each at that same input, negative four. Or, more generally, the value of
the sum function at any given input 𝑥 is the sum of the values 𝑓 of 𝑥 plus 𝑔 of
𝑥.
This is pretty similar to adding
vectors coordinate by coordinate; it’s just that there are, in a sense, infinitely
many coordinates to deal with. Similarly, there’s a sensible
notion for scaling a function by a real number, just scale all of the outputs by
that number. And again, this is analogous to
scaling a vector coordinate by coordinate; it just feels like there’s infinitely
many coordinates. Now, given that the only thing
vectors can really do is get added together or scaled, it feels like we should be
able to take the same useful constructs and problem solving techniques of linear
algebra, that were originally thought about in the context of arrows in space, and
apply them to functions as well.
For example, there’s a perfectly
reasonable notion of a linear transformation for functions, something that takes in
one function and turns it into another. One familiar example comes from
calculus, the derivative. It’s something which transforms one
function into another function. Sometimes in this context, you’ll
hear these called operators instead of transformations, but the meaning is the
same. A natural question you might wanna
ask is what it means for a transformation of functions to be linear. The formal definition of linearity
is relatively abstract and symbolically driven compared to the way that I first
talked about it in chapter three of this series, but the reward of abstractness is
that we’ll get something general enough to apply to functions, as well as
arrows.
A transformation is linear if it
satisfies two properties, commonly called additivity and scaling. Additivity means that if you add
two vectors, 𝐕 and 𝐖, then apply a transformation to their sum, you get the same
result as if you added the transformed versions of 𝐕 and 𝐖. The scaling property is that when
you scale a vector 𝐕 by some number then apply the transformation, you get the same
ultimate vector as if you scale the transformed version of 𝐕 by that same
amount. The way you’ll often hear this
described is that linear transformations preserve the operations of vector addition
and scalar multiplication. The idea of gridlines remaining
parallel and evenly spaced is that I’ve talked about in past videos is really just
an illustration of what these two properties mean in the specific case of points in
2D space.
One of the most important
consequences of these properties which makes matrix-vector multiplication possible
is that a linear transformation is completely described by where it takes the basis
vectors. Since any vector can be expressed
by scaling and adding the basis vectors in some way, finding the transformed version
of a vector comes down to scaling and adding the transformed versions of the basis
vectors in that same way. As you’ll see in just a moment,
this is as true for functions as it is for arrows. For example, calculus students are
always using the fact that the derivative is additive and has the scaling property,
even if they haven’t heard it phrased that way. If you add two functions then take
the derivative, it’s the same as first taking the derivative of each one separately
then adding the result. Similarly, if you scale a function,
then take the derivative, it’s the same as first taking the derivative, then scaling
the result.
To really drill in the parallel,
let’s see what it might look like to describe the derivative with a matrix. This will be a little tricky since
function spaces have a tendency to be infinite-dimensional, but I think this
exercise is actually quite satisfying. Let’s limit ourselves to
polynomials, things like 𝑥 squared plus three 𝑥 plus five or four 𝑥 to the
seventh minus five 𝑥 squared. Each of the polynomials in our
space will only have finitely many terms, but the full space is going to include
polynomials with arbitrarily large degree. The first thing we need to do is
give coordinates to this space, which requires choosing a basis. Since polynomials are already
written down as the sum of scaled powers of the variable 𝑥, it’s pretty natural to
just choose pure powers of 𝑥 as the basis function. In other words, our first basis
function will be the constant function, 𝑏 zero of 𝑥 equals one. The second basis function will be
𝑏 one of 𝑥 equals 𝑥, then 𝑏 two of 𝑥 equals 𝑥 squared, then 𝑏 three of 𝑥
equals 𝑥 cubed, and so on. The role that these basis functions
serve will be similar to the roles of 𝑖-hat, 𝑗-hat, and 𝑘-hat in the world of
vectors as arrows.
Since our polynomials can have
arbitrarily large degree, this set of basis functions is infinite. But that’s okay, it just means that
when we treat our polynomials as vectors, they’re going to have infinitely many
coordinates. A polynomial like 𝑥 squared plus
three 𝑥 plus five, for example, would be described with the coordinates five,
three, one, then infinitely many zeros. You’d read this as saying it’s five
times the first basis function plus three times that second basis function plus one
times the third basis function, and then none of the other basis functions should be
added from that point on. The polynomial four 𝑥 to the
seventh minus five 𝑥 squared would have the coordinates zero, zero, negative five,
zero, zero, zero, zero, four, then an infinite string of zeros. In general, since every individual
polynomial has only finitely many terms, its coordinates will be some finite string
of numbers with an infinite tail of zeros.
In this coordinate system, the
derivative is described with an infinite matrix that’s mostly full of zeros, but
which has the positive integers counting down on this offset diagonal. I’ll talk about how you could find
this matrix in just a moment, but the best way to get a feel for it is to just watch
it in action. Take the coordinates representing
the polynomial 𝑥 cubed plus five 𝑥 squared plus four 𝑥 plus five, then put those
coordinates on the right of the matrix. The only term which contributes to
the first coordinate of the result is one times four, which means the constant term
in the result will be four. This corresponds to the fact that
the derivative of four 𝑥 is the constant four. The only term contributing to the
second coordinate of the matrix-vector product is two times five, which means the
coefficient in front of 𝑥 in the derivative is ten. That one corresponds to the
derivative of five 𝑥 squared. Similarly, the third coordinate in
the matrix-vector product comes down to taking three times one. This one corresponds to the
derivative of 𝑥 cubed being three 𝑥 squared. And after that, it’ll be nothing
but zeros. What makes this possible is that
the derivative is linear. And for those of you who like to
pause and ponder, you could construct this matrix by taking the derivative of each
basis function and putting the coordinates of the results in each column.
So, surprisingly, matrix-vector
multiplication and taking a derivative, which at first seem like completely
different animals, are both just really members of the same family. In fact, most of the concepts I’ve
talked about in this series with respect to vectors as arrows in space, things like
the dot product or eigenvectors, have direct analogues in the world of
functions. Though sometimes they go by
different names, things like inner product or eigenfunction. So, back to the question of what is
a vector. The point I wanna make here is that
there are lots of vector-ish things in maths. As long as you’re dealing with a
set of objects where there’s a reasonable notion of scaling and adding, whether
that’s a set of arrows in space, lists of numbers, functions, or whatever other
crazy thing you choose to define, all of the tools developed in linear algebra
regarding vectors, linear transformations, and all that stuff, should be able to
apply.
Take a moment to imagine yourself
right now as a mathematician developing the theory of linear algebra. You want all of the definitions and
discoveries of your work to apply to all of the vector-ish things in full
generality, not just to one specific case. These sets of vector-ish things,
like arrows or lists of numbers or functions, are called vector spaces, and what you
as the mathematician might want to do is say, “Hey everyone! I don’t wanna think about all the
different types of crazy vector spaces that you all might come up with.” So what you do is establish a list
of rules that vector addition and scaling have to abide by. These rules are called axioms, and
in the modern theory of linear algebra, there are eight axioms that any vector space
must satisfy if all of the theory and constructs that we’ve discovered are going to
apply.
I’ll leave them on the screen here
for anyone who wants to pause and ponder, but basically it’s just a checklist to
make sure that the notions of vector addition and scalar multiplication do the
things that you’d expect them to do. These axioms are not so much
fundamental rules of nature as they are an interface between you, the mathematician
discovering results, and other people who might want to apply those results to new
sorts of vectors spaces. If, for example, someone defines
some crazy type of vector space, like the set of all 𝜋 creatures, with some
definition of adding and scaling 𝜋 creatures, these axioms are like a checklist of
things that they need to verify about their definitions before they can start
applying the results of linear algebra.
And you, as the mathematician,
never have to think about all the possible crazy vector spaces people might
define. You just have to prove your results
in terms of these axioms so anyone whose definitions satisfy those axioms can
happily apply you results, even if you never thought about their situation. As a consequence, you’d tend to
phrase all of your results pretty abstractly, which is to say, only in terms of
these axioms, rather than centring on a specific type of vector, like arrows in
space or functions. For example, this is why just about
every textbook you’ll find will define linear transformations in terms of additivity
and scaling, rather than talking about gridlines remaining parallel and evenly
spaced, even though the latter is more intuitive, and at least in my view, more
helpful for first time learners, even if it is specific to one situation.
So the mathematician’s answer to
“what are vectors?” is to just ignore the question. In the modern theory, the form that
vectors take doesn’t really matter, arrows, lists of numbers, functions, 𝜋
creatures, really it can be anything so long as there is some notion of adding and
scaling vectors that follows these rules. It’s like asking what the number
three really is. Whenever it comes up concretely,
it’s in the context of some triplet of things. But in maths, it’s treated as an
abstraction for all possible triplets of things and lets you reason about all
possible triplets using a single idea. Same goes with vectors, which have
many embodiments, but maths abstracts them all into a single, intangible notion of a
vector space.
But, as anyone watching this series
knows, I think it’s better to begin reasoning about vectors in a concrete
visualizable setting, like 2D space with arrows rooted at the origin. But as you learn more linear
algebra, know that these tools apply much more generally and that this is the
underlying reason why textbooks and lectures tend to be phrased, well,
abstractly. So with that folks, I think I’ll
call it an end to this essence of linear algebra series. If you’ve watched and understood
the videos, I really do believe that you have a solid foundation in the underlying
intuitions of linear algebra. This is not the same thing as
learning the full topic, of course, that’s something that can only really come from
working through problems, but the learning you do moving forward can be
substantially more efficient if you have all the right intuitions in place. So, have fun applying those
intuitions and best of luck with your future learning.