Video Transcript
In this video, we will learn how to
use the normal distribution to calculate probabilities and find unknown variables
and parameters. The normal distribution is one of
the most important probability distributions because it can be used to model several
types of naturally occurring phenomena, such as the heights of adults. And it can be a good approximation
for other distributions when the number of data points is large.
Letβs look at this normal
distribution more closely. A normal random variable is a type
of continuous random variable. And if we look at a graph of its
probability density function, it has a very distinctive shape. All normal distributions can be
represented using a bell-shaped curve, which we call the normal curve or sometimes
the Gaussian curve after the mathematician Carl Friedrich Gauss who was instrumental
in developing the theory associated with the normal distribution. We use a capital letter, in this
case π, to represent the normally distributed random variable. And the distribution can be
described completely by two parameters, firstly the mean or expectation π and
secondly the variance π squared. Letβs now consider some key
features of this probability distribution.
First, the normal distribution is
completely symmetrical about its mean value π. As with any probability
distribution, the area under the entire curve is one, which means that the area
either side of this vertical axis of symmetry is 0.5. The area to the left of any
particular π₯-value on the horizontal axis gives the proportion of points from the
distribution that are less than or equal to this value. We say the probability that the
random variable capital π is less than or equal to the observation lowercase
π₯. Now itβs worth mentioning here that
as weβre working with a continuous distribution, it doesnβt make any practical
difference whether we talk about strictly less than or less than or equal to because
the probability that our random variable is equal to any particular value is
zero.
So weβve seen that half the area
lies either side of the vertical axis. But, in fact, the area below the
curve can be approximately divided up further into the proportion of points that lie
within certain key regions. This is called the empirical
rule. If we consider one standard
deviation either side of the mean first of all, this accounts for approximately 68.3
percent of the total area, which means that approximately this proportion of values
from the distribution lie within one standard deviation of the mean.
In the same way, the region two
standard deviations either side of the mean accounts for approximately 95 percent of
the total area. So 95 percent of values from this
distribution lie within two standard deviations of the mean. And if we go even further out to
three standard deviations either side of the mean, this accounts for approximately
99.7 percent of the total area.
It is therefore very rare for
values taken from a normal distribution to be more than three standard deviations
from the mean, which has an important application in statistical process
control. If a process is assumed to be
normally distributed, any values which are more than three standard deviations from
the mean are usually assumed to be outlying values and may indicate that an unusual
event has occurred, which needs to be investigated. Itβs helpful to remember the three
key percentages associated with these distances from the mean as weβll see in our
first example.
For a normally distributed data
set with mean 32.1 and standard deviation 2.8, between which two values would
you expect 95 percent of the data set to lie?
We recall firstly that for a
normally distributed random variable, approximately 95 percent of the data
points lie within two standard deviations of the mean. We therefore need to calculate
the values two standard deviations below and two standard deviations above the
mean for this particular normal distribution.
Weβre given in the question
that the mean is 32.1 and the standard deviation is 2.8, so we can calculate
these values fairly easily. The lower value π minus two π
is 32.1 minus two multiplied by 2.8, which is 26.5. The upper value π plus two π
is 32.1 plus two times 2.8, which is 37.7. And so by recalling part of the
empirical rule for a normally distributed random variable, which tells us that
approximately 95 percent of the data set lies within two standard deviations of
the mean, we find that for this distribution, 95 percent of the data set will
lie between 26.5 and 37.7.
More generally, we may want to
find the proportion of points that lie in other regions under the curve. To do this, we need to consider
one special case of the normal distribution, which is what we call the standard
normal distribution. We usually denote this using
the letter π§. And it represents the normal
distribution which has a mean of zero and a standard deviation, and hence
variance, of one.
Values from this distribution
are known as π§-scores, and they represent the number of standard deviations
above the mean a particular value is. For example, a π§-score of 1.4
would mean a value 1.4 standard deviations above the mean, whereas a π§-score of
negative 2.1 would mean a value 2.1 standard deviations below the mean. These π§-scores for a standard
normal distribution are really useful because they allow us to view values from
a normal distribution on a standardized scale.
We have a set of statistical
tables which weβll look at in detail later, in which we can look up the areas
and hence the probabilities associated with particular π§-scores. The type of tables weβre going
to use are tables which give the probability that our random variable capital π
is between zero and an observation lowercase π§. That is the proportion of
points or the area between zero and a positive π§-score. If we wanted to then work out
the proportion of points that lie completely to the left, that is, that are
completely less than a particular positive π§-score, we would need to add on 0.5
to the value from our tables to account for the area to the left of the axis of
symmetry. Thatβs the area shaded in
pink.
Letβs consider a detailed example
of how we can use our tables to find such a probability.
Use tables to find the normal
probability corresponding to a π§-score of 2.13.
We are asked to find the normal
probability corresponding to a π§-score of 2.13, which means the proportion of
points or the area that lies to the left of this value of 2.13 under the
standard normal distribution curve. So here are our statistical
tables for the standard normal distribution. Now, these tables give the
proportion of points or the area that lies between zero and a positive
π§-score. Thatβs only the part of the
area now shaded in pink on our figure. Thatβs okay though because we
know that the normal distribution is completely symmetrical about its mean. And so the orange part of the
area is exactly 0.5. We therefore need to add 0.5 to
whatever value we find in our table.
Now, looking at our tables, we
can see that they have π§-scores ranging from zero to three in the first
column. These values increase by 0.1
each time. And then in the top row of the
table, we have options for the second decimal place of our π§-score. The π§-score we want to look up
is 2.13, so we look up 2.1 in the first column and then 0.03 because 2.1 plus
0.03 gives 2.13. We then find the value in the
cell of the table where this row and this column intersect, and it is
0.4834. This tells us that the area
between zero and 2.13 is 0.4834. The total area to the left of
2.13 is 0.5 plus this value, which is 0.9834. This is the normal probability
corresponding to a π§-score of 2.13. And it represents the total
area to the left of 2.13 under the standard normal curve.
In the previous example, we saw how
to use tables to find the area between zero and a positive π§-score. We can also use these tables to
work out the proportion of points that lie in other regions, and the symmetry of the
normal distribution plays an important role. Firstly, because the curve is
symmetrical about its mean, the area between zero and a positive π§-score is the
same as the area between the negative of that π§-score and zero. We can also work out the area to
the right of a particular π§-score by using the fact that the total area under the
curve is one. So the probability that our random
variable π is greater than or equal to a value lowercase π§ is one minus the
probability that itβs less than or equal to that value.
We can also work out the proportion
of points that lie between two particular π§-scores by subtracting one area from the
other. And weβll see some examples of the
different types of problem we might encounter in our remaining examples. So this is great if the
distribution weβre using is already the standard normal. But whatβs even more useful is that
we can use π§-scores to convert values from any normal distribution with any mean
and any standard deviation to a standard normal variable and therefore view them on
the standard scale. We can do this using the formula π§
equals π₯ minus π over π. We take an observation π₯, subtract
the mean of the distribution itβs from, and then divide by the standard deviation
π. The π§-score will then be an
observation from the standard normal distribution.
The probability that our original
random variable capital π was between zero and lowercase π₯ is therefore the same
as the probability that our new random variable π is between zero and lowercase π§,
the π§-score. And so we can use our standard
normal tables to look up this probability. Letβs see an example of this.
Let π be a random variable
which is normally distributed with mean 68 and standard deviation three. Determine the probability that
π is greater than or equal to 61.7.
So we have this normally
distributed random variable π, and we want to determine the probability that
its value is greater than or equal to 61.7. We know that 61.7 will be in
the lower half of the distribution as itβs less than the mean of 68. And so the probability weβre
looking for corresponds to the area shaded in orange under our normal
distribution curve. First, we need to calculate the
π§-score associated with this particular value using the formula π§ equals π₯
minus π over π. We have π§ equals 61.7 minus 68
over three, which is negative 2.1, which tells us that this value of 61.7 is 2.1
standard deviations below the mean of 68.
Now, we canβt look a negative
π§-score up in our standard normal tables, so we need to consider instead the
symmetry of the normal distribution curve. On our standardized scale, the
area above a π§-score of negative 2.1 will be the same as the area below a
π§-score of 2.1. We can look up the probability
associated with a π§-score of 2.1 in our standard normal tables, which will give
us the area to the right of the mean. And then we can add 0.5 to
account for the area to the left of the mean. Using our tables, we see that
the probability associated with a π§-score of 2.1 is 0.4821. So the probability that π§ is
less than or equal to 2.1, which is the same as the probability π§ is greater
than or equal to negative 2.1, which for our unstandardized random variable is
the probability that π is greater than or equal to 61.7, is 0.5 plus 0.4821,
which is 0.9821.
Letβs now consider an example in
which we calculate the probability between two values.
Let π be a random variable which
is normally distributed with mean 63 and variance 144. Determine the probability π is
greater than or equal to 37.56 and less than or equal to 57.36.
So we have a normally distributed
variable π with a mean of 63 and a variance of 144 β thatβs 12 squared. We want to determine the
probability that π is between these two values, which are both in the lower half of
the distribution. We begin by calculating the
π§-score for each value using the formula π§ equals π₯ minus π over π. For our first π₯-value, we have
37.56 minus 63 over 12, which is negative 2.12. And for our second value, the
π§-score is negative 0.47.
Now, we canβt look either of these
values up in our standard normal tables as theyβre both negative. So instead we use the symmetry of
the normal distribution curve. On our standardized scale now, the
probability that π§ is greater than or equal to negative 2.12 but less than or equal
to negative 0.47 is the same as the probability that π§ is greater than or equal to
positive 0.47 and less than or equal to 2.12, both of which we can look up in our
standard normal tables.
Remember the tables give us the
probability that π§ is between zero and a positive π§-score. So we can subtract the probability
for 0.47 from the probability for 2.12. From our tables, the probabilities
are 0.4830 and 0.1808. And then we find the difference,
which is 0.3022. So, using standardized π§-scores
and the symmetry of the normal distribution, we found the probability that π is
greater than or equal to 37.56 and less than or equal to 57.36 is 0.3022.
Letβs now summarize the key points
from this video. First, we saw the percentages
associated with three key areas under the normal distribution curve using the
empirical rule. To calculate the standardized
π§-score of an observation π₯, we subtract the mean π and then divide by the
standard deviation π. This will convert an observation
from a normal distribution with mean π and standard deviation π to an observation
from the standard normal distribution with mean zero and standard deviation one. We can use standard normal
distribution tables to look up the area between zero and a positive π§-score π§. We can then use these values from
the tables together with the symmetry of the normal distribution curve to calculate
probabilities in a number of different formats.