### Video Transcript

Traditionally, dot products are
something that’s introduced really early on in a linear algebra course, typically
right at the start. So it might seem strange that I’ve
pushed them back this far in the series. I did this because there’s a
standard way to introduce the topic which requires nothing more than a basic
understanding of vectors. But a fuller understanding of the
role the dot products play in math can only really be found under the light of
linear transformations. Before that though, let me just
briefly cover the standard way that products are introduced, which I’m assuming is
at least partially review for a number of viewers.

Numerically, if you have two
vectors of the same dimension, two lists of numbers with the same length, taking
their dot product means pairing up all of the coordinates, multiplying those pairs
together, and adding the result. So the vector one, two dotted with
three, four would be one times three plus two times four. The vector six, two, eight, three
dotted with one, eight, five, three would be six times one plus two times eight plus
eight times five plus three times three.

Luckily, this computation has a
really nice geometric interpretation. To think about the dot product
between two vectors 𝐕 and 𝐖, imagine projecting 𝐖 onto the line that passes
through the origin and the tip of 𝐕. Multiplying the length of this
projection by the length of 𝐕, you have the dot product 𝐕 dot 𝐖. Except when this projection of 𝐖
is pointing in the opposite direction from 𝐕, that dot product will actually be
negative.

So when two vectors are generally
pointing in the same direction, their dot product is positive. When they’re perpendicular, meaning
the projection of one onto the other is the zero vector, their dot product is
zero. And if they’re pointing generally
the opposite direction, their dot product is negative.

Now, this interpretation is weirdly
asymmetric; it treats the two vectors very differently. So when I first learned this, I was
surprised that order doesn’t matter. You could instead project 𝐕 onto
𝐖; multiply the length of the projected 𝐕 by the length of 𝐖 and get the same
result. I mean, doesn’t that feel like a
really different process? Here’s the intuition for why order
doesn’t matter: if 𝐕 and 𝐖 happened to have the same length, we could leverage
some symmetry, since projecting 𝐖 onto 𝐕 then multiplying the length of that
projection by the length of 𝐕 is a complete mirror image of projecting 𝐕 onto 𝐖
then multiplying the length of that projection by the length of 𝐖.

Now, if you scale one of them, say
𝐕 by some constant like two, so that they don’t have equal length, the symmetry is
broken. But let’s think through how to
interpret the dot product between this new vector two times 𝐕 and 𝐖. If you think of 𝐖 is getting
projected onto 𝐕, then the dot product two 𝐕 dot 𝐖 will be exactly twice the dot
product 𝐕 dot 𝐖. This is because when you scale 𝐕
by two, it doesn’t change the length of the projection of 𝐖, but it doubles the
length of the vector that you’re projecting onto.

But, on the other hand, let’s say
you’re thinking about 𝐕 getting projected onto 𝐖. Well, in that case, the length of
the projection is the thing to get scaled when we multiply 𝐕 by two. The length of the vector that
you’re projecting onto stays constant. So the overall effect is still to
just double the dot product. So, even though symmetry is broken
in this case, the effect that this scaling has on the value of the dot product is
the same under both interpretations.

There’s also one other big question
that confused me when I first learned this stuff. Why on earth does this numerical
process of matching coordinates, multiplying pairs, and adding them together have
anything to do with projection? Well, to give a satisfactory answer
and also to do full justice to the significance of the dot product, we need to
unearth something a little bit deeper going on here, which often goes by the name
“duality.” But before getting into that, I
need to spend some time talking about linear transformations from multiple
dimensions to one dimension, which is just the number line. These are functions that take in a
2D vector and spit out some number. But linear transformations are, of
course, much more restricted than your run-of-the-mill function with a 2D input and
a 1D output. As with transformations in higher
dimensions, like the ones I talked about in chapter 3, there are some formal
properties that make these functions linear. But I’m going to purposely ignore
those here so as to not distract from our end goal and instead focus on a certain
visual property that’s equivalent to all the formal stuff.

If you take a line of evenly spaced
dots and apply a transformation, a linear transformation will keep those dots evenly
spaced, once they land in the output space, which is the number line. Otherwise, if there’s some line of
dots that gets unevenly spaced, then your transformation is not linear. As with the cases we’ve seen
before, one of these linear transformations is completely determined by where it
takes 𝑖-hat and 𝑗-hat. But this time, each one of those
basis vectors just lands on a number. So when we record where they land
as the columns of a matrix, each of those columns just has a single number. This is a one-by-two matrix.

Let’s walk through an example of
what it means to apply one of these transformations to a vector. Let’s say you have a linear
transformation that takes 𝑖-hat to one and 𝑗-hat to negative two. To follow where a vector with
coordinates, say, four, three, ends up, think of breaking up this vector as four
times 𝑖-hat plus three times 𝑗-hat. A consequence of linearity is that
after the transformation, the vector will be four times the place where 𝑖-hat
lands, one, plus three times the place where 𝑗-hat lands, negative two, which in
this case implies that it lands on negative two. When you do this calculation purely
numerically, it’s matrix-vector multiplication.

Now, this numerical operation of
multiplying a one-by-two matrix by a vector feels just like taking the dot product
of two vectors. Doesn’t that one-by-two matrix just
look like a vector that we tipped on its side? In fact, we could say right now
that there’s a nice association between one-by-two matrices and 2D vectors, defined
by tilting the numerical representation of a vector on its side to get the
associated matrix or to tip the matrix back up to get the associated vector.

Since we’re just looking at
numerical expressions right now, going back and forth between vectors and one-by-two
matrices might feel like a silly thing to do. But this suggests something that’s
truly awesome from the geometric view. There’s some kind of connection
between linear transformations that take vectors to numbers and vectors
themselves.

Let me show an example that
clarifies the significance and which just so happens to also answer the dot product
puzzle from earlier. Unlearn what you have learned and
imagine that you don’t already know that the dot product relates to projection. What I’m gonna do here is take a
copy of the number line and place it diagonally and space somehow with the number
zero sitting at the origin. Now think of the two-dimensional
unit vector, whose tips sit where the number one on the number line is. I want to give that guy a name,
𝐮-hat. This little guy plays an important
role in what’s about to happen, so just keep them in the back of your mind. If we project 2D vectors straight
onto this diagonal number line, in effect, we’ve just defined a function that takes
2D vectors to numbers. What’s more, this function is
actually linear since it passes our visual test that any line of evenly spaced dots
remains evenly spaced once it lands on the number line.

Just to be clear, even though I’ve
embedded the number line in 2D space like this, the output of the function are
numbers, not 2D vectors. You should think of a function that
takes in two coordinates and outputs a single coordinate. But that vector 𝐮-hat is a
two-dimensional vector living in the input space. It’s just situated in such a way
that overlaps with the embedding of the number line.

With this projection, we just
defined a linear transformation from 2D vectors to numbers, so we’re gonna be able
to find some kind of one-by-two matrix that describes that transformation. To find that one-by-two matrix,
let’s zoom in on this diagonal number line setup and think about where 𝑖-hat and
𝑗-hat each land, since those landing spots are gonna be the columns of the
matrix.

This part is super cool; we can
reason through it with a really elegant piece of symmetry. Since 𝑖-hat and 𝐮-hat are both
unit vectors, projecting 𝑖-hat onto the line passing through 𝐮-hat looks totally
symmetric to projecting 𝐮-hat onto the 𝑥-axis. So when we ask: what number does
𝑖-hat land on when it gets projected? The answer is gonna be the same as
whatever 𝐮-hat lands on when its projected onto the 𝑥-axis. But projecting 𝐮-hat onto the
𝑥-axis just means taking the 𝑥-coordinate of 𝐮-hat. So, by symmetry, the number where
𝑖-hat lands when it’s projected onto that diagonal number line is gonna be the
𝑥-coordinate of 𝐮-hat. Isn’t that cool?

The reasoning is almost identical
for the 𝑗-hat case. Think about it for a moment. For all the same reasons, the
𝑦-coordinate of 𝐮-hat gives us the number where 𝑗-hat lands when it’s projected
onto the number line copy.

Pause and ponder that for a moment;
I just think that’s really cool. So the entries of the one-by-two
matrix describing the projection transformation are going to be the coordinates of
𝐮-hat. And computing this projection
transformation for arbitrary vectors in space, which requires multiplying that
matrix by those vectors, is computationally identical to taking a dot product with
𝐮-hat. This is why taking the dot product
with a unit vector can be interpreted as projecting a vector onto the span of that
unit vector and taking the length.

So what about non-unit vectors? For example, let’s say we take that
unit vector 𝐮-hat, but we scale it up by a factor of three. Numerically, each of its components
gets multiplied by three. So looking at the matrix associated
with that vector, it takes 𝑖-hat and 𝑗-hat to three times the values where they
landed before. Since this is all linear, it
implies more generally, that the new matrix can be interpreted as projecting any
vector onto the number line copy and multiplying where it lands by three. This is why the dot product with a
non-unit vector can be interpreted as first projecting onto that vector then scaling
up the length of that projection by the length of the vector.

Take a moment to think about what
happened here. We had a linear transformation from
2D space to the number line, which was not defined in terms of numerical vectors or
numerical dot products; it was just defined by projecting space onto a diagonal copy
of the number line. But because the transformation is
linear, it was necessarily described by some one-by-two matrix. And since multiplying a one-by-two
matrix by a 2D vector is the same as turning that matrix on its side and taking a
dot product, this transformation was inescapably related to some 2D vector.

The lesson here is that anytime you
have one of these linear transformations, whose output space is the number line, no
matter how it was defined there’s gonna be some unique vector 𝐕 corresponding to
that transformation, in the sense that applying the transformation is the same thing
as taking a dot product with that vector.

To me, this is utterly
beautiful. It’s an example of something in
math called “duality.” Duality shows up in many different
ways and forms throughout math, and it’s super tricky to actually define. Loosely speaking, it refers to
situations where you have a natural, but surprising correspondence between two types
of mathematical thing. For the linear algebra case that
you just learned about, you’d say that the dual of a vector is the linear
transformation that it encodes. And the dual of a linear
transformation from some space to one dimension is a certain vector in that
space.

So, to sum up, on the surface, the
dot product is a very useful geometric tool for understanding projections and for
testing whether or not vectors tend to point in the same direction. And that’s probably the most
important thing for you to remember about the dot product. But at deeper level, dotting two
vectors together is a way to translate one of them into the world of
transformations. Again, numerically, this might feel
like a silly point to emphasize; it’s just two computations that happen to look
similar. But the reason I find this so
important, is that throughout math, when you’re dealing with a vector, once you
really get to know its personality sometimes you realize that it’s easier to
understand it not as an arrow in space, but as the physical embodiment of a linear
transformation. It’s as if the vector is really
just a conceptual shorthand for certain transformation, since it’s easier for us to
think about arrows and space rather than moving all of that space to the number
line.

In the next video, you’ll see
another really cool example of this duality in action as I talk about the cross
product.