Video Transcript
For a normally distributed data
set with mean 32.1 and standard deviation 2.8, between which two values would
you expect 95 percent of the data set to lie?
We recall firstly that for a
normally distributed random variable, approximately 95 percent of the data
points lie within two standard deviations of the mean. We therefore need to calculate
the values two standard deviations below and two standard deviations above the
mean for this particular normal distribution.
We’re given in the question
that the mean is 32.1 and the standard deviation is 2.8, so we can calculate
these values fairly easily. The lower value 𝜇 minus two 𝜎
is 32.1 minus two multiplied by 2.8, which is 26.5. The upper value 𝜇 plus two 𝜎
is 32.1 plus two times 2.8, which is 37.7. And so by recalling part of the
empirical rule for a normally distributed random variable, which tells us that
approximately 95 percent of the data set lies within two standard deviations of
the mean, we find that for this distribution, 95 percent of the data set will
lie between 26.5 and 37.7.
More generally, we may want to
find the proportion of points that lie in other regions under the curve. To do this, we need to consider
one special case of the normal distribution, which is what we call the standard
normal distribution. We usually denote this using
the letter 𝑧. And it represents the normal
distribution which has a mean of zero and a standard deviation, and hence
variance, of one.
Values from this distribution
are known as 𝑧-scores, and they represent the number of standard deviations
above the mean a particular value is. For example, a 𝑧-score of 1.4
would mean a value 1.4 standard deviations above the mean, whereas a 𝑧-score of
negative 2.1 would mean a value 2.1 standard deviations below the mean. These 𝑧-scores for a standard
normal distribution are really useful because they allow us to view values from
a normal distribution on a standardized scale.
We have a set of statistical
tables which we’ll look at in detail later, in which we can look up the areas
and hence the probabilities associated with particular 𝑧-scores. The type of tables we’re going
to use are tables which give the probability that our random variable capital 𝑍
is between zero and an observation lowercase 𝑧. That is the proportion of
points or the area between zero and a positive 𝑧-score. If we wanted to then work out
the proportion of points that lie completely to the left, that is, that are
completely less than a particular positive 𝑧-score, we would need to add on 0.5
to the value from our tables to account for the area to the left of the axis of
symmetry. That’s the area shaded in
pink.