Video Transcript
Eigenvectors and eigenvalues is one
of those topics that a lot of students find particularly unintuitive. Questions like “Why are we doing
this?” and “What does this actually mean?” are too often left just floating away in
an unanswered sea of computations.
And as I’ve put out the videos of
this series, a lot of you have commented about looking forward to visualizing this
topic in particular. I suspect that the reason for this
is not so much that eigen-things are particularly complicated or poorly
explained. In fact, it’s comparatively
straightforward, and I think most books do a fine job explaining it.
The issue is that it only really
makes sense if you have a solid visual understanding for many of the topics that
precede it. Most important here is that you
know how to think about matrices as linear transformations, but you also need to be
comfortable with things like determinants, linear systems of equations, and change
of basis.
Confusion about eigen-stuffs
usually has more to do with the shaky foundation in one of these topics than it does
with eigenvectors and eigenvalues themselves.
To start, consider some linear
transformation in two dimensions like the one shown here. It moves the basis vector 𝑖-hat to
the coordinates three, zero and 𝑗-hat to one, two. So it’s represented with a matrix
whose columns are three, zero and one, two.
Focus in on what it does to one
particular vector, and think about the span of that vector, the line passing through
its origin and its tip. Most vectors are gonna get knocked
off their span during the transformation. I mean, it would seem pretty
coincidental if the place where the vector landed also happened to be somewhere on
that line. But some special vectors do remain
on their own span, meaning the effect that the matrix has on such a vector is just
to stretch it or squish it like a scalar.
For this specific example, the
basis vector 𝑖-hat is one such special vector. The span of 𝑖-hat is the
𝑥-axis. And from the first column of the
matrix, we can see that 𝑖-hat moves over to three times itself, still on that
𝑥-axis.
What’s more, because of the way
linear transformations work, any other vector on the 𝑥-axis is also just stretched
by a factor of three and, hence, remains on its own span. A slightly sneakier vector that
remains on its own span during this transformation is negative one, one. It ends up getting stretched by a
factor of two.
And again, linearity is gonna imply
that any other vector on the diagonal line spanned by this guy is just gonna get
stretched out by a factor of two. And for this transformation, those
are all the vectors with this special property of staying on their span, those on
the 𝑥-axis, getting stretched out by a factor of three, and those on this diagonal
line, getting stretched by a factor of two.
Any other vector is gonna get
rotated somewhat during the transformation, knocked off the line that it spans. As you might have guessed by now,
these special vectors are called the eigenvectors of the transformation, and each
eigenvector has associated with it what’s called an eigenvalue, which is just the
factor by which it’s stretched or squished during the transformation.
Of course, there’s nothing special
about stretching versus squishing or the fact that these eigenvalues happen to be
positive. In another example, you could have
an eigenvector with eigenvalue negative one-half, meaning that the vector gets
flipped and squished by a factor of one-half. But the important part here is that
it stays on the line that it spans out without getting rotated off of it.
For a glimpse of why this might be
a useful thing to think about, consider some three-dimensional rotation. If you can find an eigenvector for
that rotation, a vector that remains on its own span, what you’ve found is the axis
of rotation. And it’s much easier to think about
a 3D rotation in terms of some axis of rotation and an angle by which it’s rotating
rather than thinking about the full three-by-three matrix associated with that
transformation.
In this case, by the way, the
corresponding eigenvalue would have to be one, since rotations never stretch or
squish anything. So the length of the vector would
remain the same. This pattern shows up a lot in
linear algebra.
With any linear transformation
described by a matrix, you could understand what it’s doing by reading off the
columns of this matrix as the landing spots for basis vectors. But often, a better way to get at
the heart of what the linear transformation actually does, less dependent on your
particular coordinate system, is to find the eigenvectors and eigenvalues.
I won’t cover the full details on
methods for computing eigenvectors and eigenvalues here, but I’ll try to give an
overview of the computational ideas that are most important for a conceptual
understanding.
Symbolically, here’s what the idea
of an eigenvector looks like. 𝐴 is the matrix representing some
transformation, with 𝐯 as the eigenvector and 𝜆 is a number, namely, the
corresponding eigenvalue. What this expression is saying is
that the matrix-vector product, 𝐴 times 𝐯, gives the same result as just scaling
the eigenvector 𝐯 by some value 𝜆.
So finding the eigenvectors and
their eigenvalues of the matrix 𝐴 comes down to finding the values of 𝐯 and 𝜆
that make this expression true. It’s a little awkward to work with
at first because that left-hand side represents matrix-vector multiplication, but
the right-hand side here is scalar-vector multiplication.
So let’s start by rewriting that
right-hand side as some kind of matrix-vector multiplication, using a matrix which
has the effect of scaling any vector by a factor of 𝜆. The columns of such a matrix will
represent what happens to each basis vector, and each basis vector is simply
multiplied by 𝜆, so this matrix will have the number 𝜆 down the diagonal, with
zeros everywhere else.
The common way to write this guy is
to factor that 𝜆 out and write it as 𝜆 times 𝐼, where 𝐼 is the identity matrix
with ones down the diagonal. With both sides looking like
matrix-vector multiplication, we can subtract off that right-hand side and factor
out the 𝐯.
So what we now have is a new matrix
𝐴 minus 𝜆 times the identity, and we’re looking for a vector 𝐯 such that this new
matrix times 𝐯 gives the zero vector. Now this will always be true if 𝐯
itself is the zero vector, but that’s boring. What we want is a nonzero
eigenvector. And if you watched Chapter 5 and 6,
you’ll know that the only way it’s possible for the product of a matrix with a
nonzero vector to become zero is if the transformation associated with that matrix
squishes space into a lower dimension. And that squishification
corresponds to a zero determinant for the matrix.
To be concrete, let’s say your
matrix 𝐴 has columns two, one and two, three and think about subtracting off a
variable amount 𝜆 from each diagonal entry. Now imagine tweaking 𝜆, turning a
knob to change its value. As that value of 𝜆 changes, the
matrix itself changes, and so the determinant of the matrix changes.
The goal here is to find a value of
𝜆 that will make this determinant zero, meaning the tweaked transformation squishes
space into a lower dimension. In this case, the sweet spot comes
when 𝜆 equals one. Of course, if we’ve chosen some
other matrix, the eigenvalue might not necessarily be one. The sweet spot might be hit at some
other value of 𝜆.
So this is kind of a lot, but let’s
unravel what this is saying. When 𝜆 equals one, the matrix 𝐴
minus 𝜆 times the identity squishes space onto a line. That means there’s a nonzero vector
𝐯 such that 𝐴 minus 𝜆 times the identity times 𝐯 equals the zero vector. And remember, the reason we care
about that is because it means 𝐴 times 𝐯 equals 𝜆 times 𝐯, which you can read
off as saying that the vector 𝐯 is an eigenvector of 𝐴 staying on its own span
during the transformation 𝐴.
In this example, the corresponding
eigenvalue is one. So 𝐯 would actually just stay
fixed in place. Pause and ponder if you need to
make sure that that line of reasoning feels good.
This is the kind of thing I
mentioned in the introduction. If you didn’t have a solid grasp of
determinants and why they relate to linear systems of equations having nonzero
solutions, an expression like this would feel completely out of the blue. To see this in action, let’s
revisit the example from the start.
With the matrix whose columns are
three, zero and one, two, to find if a value 𝜆 is an eigenvalue, subtract it from
the diagonals of this matrix and compute the determinant. Doing this, we get a certain
quadratic polynomial in 𝜆: three minus 𝜆 times two minus 𝜆.
Since 𝜆 can only be an eigenvalue,
if this determinant happens to be zero, you can conclude that the only possible
eigenvalues are 𝜆 equals two and 𝜆 equals three. To figure out what the eigenvectors
are that actually have one of these eigenvalues, say 𝜆 equals two, plug in that
value of 𝜆 to the matrix and then solve for which vectors this diagonally altered
matrix sends to zero. If you computed this the way you
would any other linear system, you’d see that the solutions are all the vectors on
the diagonal line spanned by negative one, one.
This corresponds to the fact that
the unaltered matrix three, zero, one, two has the effect of stretching all those
vectors by a factor of two. Now a 2D transformation doesn’t
have to have eigenvectors. For example, consider a rotation by
90 degrees. This doesn’t have any eigenvectors
since it rotates every vector off of its own span.
If you actually try computing the
eigenvalues of a rotation like this, notice what happens. Its matrix has columns zero, one
and negative one, zero. Subtract off 𝜆 from the diagonal
elements and look for when the determinant is zero.
In this case, you get the
polynomial 𝜆 squared plus one. The only roots of that polynomial
are the imaginary numbers 𝑖 and negative 𝑖. The fact that there are no real
number solutions indicates that there are no eigenvectors.
Another pretty interesting example,
worth holding in the back of your mind, is a shear. This fixes 𝑖-hat in place and
moves 𝑗-hat one over. So its matrix has columns one, zero
and one, one.
All of the vectors on the 𝑥-axis
are eigenvectors with eigenvalue one since they remain fixed in place. In fact, these are the only
eigenvectors. When you subtract off 𝜆 from the
diagonals and compute the determinant, what you get is one minus 𝜆 squared, and the
only root of this expression is 𝜆 equals one.
This lines up with what we see
geometrically, that all of the eigenvectors have eigenvalue one. Keep in mind, though, it’s also
possible to have just one eigenvalue, but with more than just a line full of
eigenvectors. A simple example is a matrix that
scales everything by two. The only eigenvalue is two. But every vector in the plane gets
to be an eigenvector with that eigenvalue.
Now is another good time to pause
and ponder some of this before I move on to the last topic.
I wanna finish off here with the
idea of an eigenbasis, which relies heavily on ideas from the last video. Take a look at what happens. If our basis vectors just so happened to be eigenvectors.
For example, maybe 𝑖-hat is scaled
by negative one and 𝑗-hat is scaled by two. Writing their new coordinates as
the columns of a matrix, notice that those scalar multiples, negative one and two,
which are the eigenvalues of 𝑖-hat and 𝑗-hat, sit on the diagonal of our matrix,
and every other entry is a zero.
Anytime a matrix has zeros
everywhere other than the diagonal, it’s called, reasonably enough, a diagonal
matrix. And the way to interpret this is
that all the basis vectors are eigenvectors, with the diagonal entries of this
matrix being their eigenvalues.
There are a lot of things that make
diagonal matrices much nicer to work with. One big one is that it’s easier to
compute what will happen if you multiply this matrix by itself a whole bunch of
times. Since all one of these matrices
does is scale each basis vector by some eigenvalue, applying that matrix many times,
say 100 times, is just gonna correspond to scaling each basis vector by the 100th
power of the corresponding eigenvalue.
In contrast, try computing the
100th power of a nondiagonal matrix. Really, try it for a moment. It’s a nightmare. Of course, you’ll rarely be so
lucky as to have your basis vectors also be eigenvectors. But if your transformation has a
lot of eigenvectors, like the one from the start of this video, enough so that you
can choose a set that spans the full space, then you could change your coordinate
system so that these eigenvectors are your basis vectors.
I talked about change of basis last
video, but I’ll go through a superquick reminder here of how to express the
transformation currently written in our coordinate system into a different
system.
Take the coordinates of the vectors
that you want to use as a new basis, which, in this case, means our two
eigenvectors. Then make those coordinates the
columns of a matrix, known as the change of basis matrix. When you sandwich the original
transformation, putting the change of basis matrix on its right and the inverse of
the change of basis matrix on its left, the result will be a matrix representing
that same transformation, but from the perspective of the new basis vector’s
coordinate system.
The whole point of doing this with
eigenvectors is that this new matrix is guaranteed to be diagonal, with its
corresponding eigenvalues down that diagonal. This is because it represents
working in a coordinate system where what happens to the basis vectors is that they
get scaled during the transformation.
A set of basis vectors, which are
also eigenvectors, is called, again reasonably enough, an eigenbasis. So if, for example, you needed to
compute the 100th power of this matrix, it would be much easier to change to an
eigenbasis, compute the 100th power in that system, then convert back to our
standard system.
You can’t do this with all
transformations. A shear, for example, doesn’t have
enough eigenvectors to span the full space. But if you can find an eigenbasis,
it makes matrix operations really lovely.
For those of you willing to work
through a pretty neat puzzle to see what this looks like in action and how it can be
used to produce some surprising results, I’ll leave up a prompt here on the
screen. It takes a bit of work, but I think
you’ll enjoy it.
The next and final video of this
series is gonna be on abstract vector spaces. See you then.