### Video Transcript

Hey everyone! Where we last left off, I showed what linear transformations look like and how to
represent them using matrices. This is worth a quick recap because itโs just really important. But of course, if this feels like more than just a recap, go back and watch the full
video.

Technically speaking, linear transformations are functions, with vectors as inputs
and vectors as outputs. But I showed last time how we can think about them visually as smooshing around space
in such a way the grid lines stay parallel and evenly spaced and so that the origin
remains fixed.

The key takeaway was that a linear transformation is completely determined by where
it takes the basis vectors of the space which, for two dimensions, means ๐-hat and
๐-hat. This is because any other vector can be described as a linear combination of those
basis vectors. A vector with coordinates ๐ฅ, ๐ฆ is ๐ฅ times ๐-hat plus ๐ฆ times ๐-hat.

After going through the transformation, this property, the grid lines remain parallel
and evenly spaced, has a wonderful consequence. The place where your vector lands will be ๐ฅ times the transformed version of ๐-hat
plus ๐ฆ times the transformed version of ๐-hat. This means if you keep a record of the coordinates where ๐-hat lands and the
coordinates where ๐-hat lands, you can compute that a vector which starts at ๐ฅ, ๐ฆ
must land on ๐ฅ times the new coordinates of ๐-hat plus ๐ฆ times the new
coordinates of ๐-hat.

The convention is to record the coordinates of where ๐-hat and ๐-hat land as the
columns of a matrix and to define this sum of the scaled versions of those columns
by ๐ฅ and ๐ฆ to be matrix-vector multiplication. In this way, a matrix represents a specific linear transformation. And multiplying a matrix by a vector is, what it means computationally, to apply that
transformation to that vector. Alright, recap over. Onto the new stuff.

Oftentimes, you find yourself wanting to describe the effect of applying one
transformation and then another. For example, maybe you want to describe what happens when you first rotate the plane
90 degrees counterclockwise then apply a shear. The overall effect here, from start to finish, is another linear transformation,
distinct from the rotation and the sheer. This new linear transformation is commonly called the โcompositionโ of the two
separate transformations we applied. And like any linear transformation, it can be described with a matrix all of its own,
by following ๐-hat and ๐-hat.

In this example, the ultimate landing spot for ๐-hat after both transformations is
one, one. So letโs make that the first column of a matrix. Likewise, ๐-hat ultimately ends up at the location negative one, zero, so we make
that the second column of the matrix. This new matrix captures the overall effect of applying a rotation then a sheer, but
as one single action, rather than two successive ones.

Hereโs one way to think about that new matrix: if you were to take some vector and
pump it through the rotation then the sheer, the long way to compute where it ends
up is to, first, multiply it on the left by the rotation matrix; then, take whatever
you get and multiply that on the left by the sheer matrix. This is, numerically speaking, what it means to apply a rotation then a sheer to a
given vector. But whatever you get should be the same as just applying this new composition matrix
that we just found, by that same vector, no matter what vector you chose, since this
new matrix is supposed to capture the same overall effect as the rotation-then-sheer
action.

Based on how things are written down here, I think itโs reasonable to call this new
matrix, the โproductโ of the original two matrices. Donโt you? We can think about how to compute that product more generally in just a moment, but
itโs way too easy to get lost in the forest of numbers. Always remember that multiplying two matrices like this has the geometric meaning of
applying one transformation then another. One thing thatโs kinda weird here is that this has reading from right to left. You first apply the transformation represented by the matrix on the right. Then, you apply the transformation represented by the matrix on the left. This stems from function notation, since we write functions on the left of
variables. So every time you compose two functions, you always have to read it right to
left. Good news for the Hebrew readers, bad news for the rest of us.

Letโs look at another example. Take the matrix with columns one, one and negative two, zero, whose transformation
looks like this, and letโs call it ๐ one. Next, take the matrix with columns zero, one and two, zero, whose transformation
looks like this, and letโs call that guy ๐ two. The total effect of applying ๐ one then ๐ two gives us a new transformation. So letโs find its matrix. But this time, letโs see if we can do it without watching the animations and instead
just using the numerical entries in each matrix.

First, we need to figure out where ๐-hat goes. After applying ๐ one, the new coordinates of ๐-hat, by definition, are given by
that first column of ๐ one, namely, one, one. To see what happens after applying ๐ two, multiply the matrix for ๐ two by that
vector one, one. Working it out the way that I described last video, youโll get the vector two,
one. This will be the first column of the composition matrix. Likewise, to follow ๐-hat, the second column of ๐ one tells us that it first lands
on negative two, zero. Then, when we apply ๐ two to that vector, you can work out the matrix-vector product
to get zero, negative two, which becomes the second column of our composition
matrix.

Let me talk to that same process again, but this time, Iโll show variable entries in
each matrix, just to show that the same line of reasoning works for any
matrices. This is more symbol heavy and will require some more room, but it should be pretty
satisfying for anyone who has previously been taught matrix multiplication the more
rote way. To follow where ๐-hat goes, start by looking at the first column of the matrix on
the right, since this is where ๐-hat initially lands. Multiplying that column by the matrix on the left is how you can tell where the
intermediate version of ๐-hat ends up after applying the second transformation. So, the first column of the composition matrix will always equal the left matrix
times the first column of the right matrix. Likewise, ๐-hat will always initially land on the second column of the right
matrix. So multiplying the left matrix by this second column will give its final
location. And hence, thatโs the second column of the composition matrix.

Notice, thereโs a lot of symbols here. And itโs common to be taught this formula as something to memorize along with a
certain algorithmic process to kind of help remember it. But I really do think that before memorizing that process, you should get in the
habit of thinking about what matrix multiplication really represents: applying one
transformation after another. Trust me, this will give you a much better conceptual framework that makes the
properties of matrix multiplication much easier to understand.

For example, hereโs a question: does it matter what order we put the two matrices in
when we multiply them? Well, letโs think through a simple example like the one from earlier. Take a shear which fixes ๐-hat and smooshes ๐-hat over to the right and a 90-degree
rotation. If you first do the shear then rotate, we can see that ๐-hat ends up at zero, one
and ๐-hat ends up at negative one, one. Both are generally pointing close together. If you first rotate then do the shear, ๐-hat ends up over at one, one and ๐-hat is
off on a different direction at negative one, zero. And theyโre pointing, you know, farther apart. The overall effect here is clearly different. So, evidently, order totally does matter.

Notice, by thinking in terms of transformations, thatโs the kind of thing that you
can do in your head by visualizing. No matrix multiplication necessary. I remember when I first took linear algebra, thereโs this one homework problem that
asked us to prove that matrix multiplication is associative. This means that if you have three matrices ๐ด, ๐ต and ๐ถ and you multiply them
altogether, it shouldnโt matter if you first compute ๐ด times ๐ต then multiply the
result by ๐ถ or if you first multiply ๐ต times ๐ถ then multiply that result by ๐ด on
the left. In other words, it doesnโt matter where you put the parentheses.

Now if you try to work through this numerically, like I did back then, itโs horrible,
just horrible, and unenlightening for that matter. But when you think about matrix multiplication as applying one transformation after
another, this property is just trivial. Can you see why? What itโs saying is that if you first apply ๐ถ then ๐ต then ๐ด, itโs the same as
applying ๐ถ then ๐ต then ๐ด. I mean thereโs nothing to prove; youโre just applying the same three things one after
the other all in the same order. This might feel like cheating. But itโs not! This is an honest-to-goodness proof that matrix multiplication is associative, and,
even better than that, itโs a good explanation for why that property should be
true.

I really do encourage you to play around more with this idea: imagining two different
transformations, thinking about what happens when you apply one after the other, and
then working out the matrix product numerically. Trust me, this is the kind of playtime that really makes the idea sink in. In the next video, Iโll start talking about extending these ideas beyond just two
dimensions. See you then!