Lesson Explainer: Normal Distribution

In this explainer, we will learn how to use the normal distribution to calculate probabilities and find unknown variables and parameters.

For real world variables like weights of newborn babies or salaries of employees in a large company, we expect them to have a distribution that is symmetric and concentrated near the mean. For example, the histogram below shows a data set that is symmetric and concentrated near the mean.

When a data set is symmetric and concentrated near the median, we say that it is normally distributed. For normally distributed data sets, the empirical rule (also known as the rule) gives useful estimates.

Theorem: Empirical Rule

If a data set is normally distributed with mean and standard deviation , then

approximately of the data lies within from ,
approximately of the data lies within from ,
approximately of the data lies within from .

Like data sets, a continuous random variable that is normally distributed will have a probability distribution graph that is symmetric and concentrated near the mean. If is a normal random variable with mean and standard deviation , we denote . We note that the second parameter provided in the notation represents the variance rather than the standard deviation.

The shape of the probability distribution graph of a normal random variable is known as a bell curve. The total area under the bell curve is equal to 1 or , and the rule also applies to the area under the curve as portrayed below.

Let us examine a few examples on the empirical rule.

Example 1: Estimating Areas Under a Normal Distribution Curve

For the normal distribution shown, approximately what percentage of data points lie in the shaded region?

Answer

Recall from the rule that

approximately of the data lies within from ,
approximately of the data lies within from ,
approximately of the data lies within from .

The shaded region lies between and , so it takes exactly half the area within from . Since we know that approximately of the data should lie within from , approximately half of this data should be between and .

Approximately of tha data lies in the shaded region.

Example 2: Estimating Areas Under a Normal Distribution Curve

For a normally distributed data set with mean 32.1 and standard deviation 2.8, between which two values would you expect of the data set to lie?

Answer

Recall from the rule that

approximately of the data lies within from ,
approximately of the data lies within from ,
approximately of the data lies within from .

of the data in a normal distribution lies within from .

So, the lower endpoint of from is

The upper endpoint of from is

So, we expect of the data to lie between 26.5 and 37.7.

For normal random variables, we can use the standard normal table to calculate the probability of a given event. This approach allows us to compute probabilities of more general events compared to the empirical rule.

The standard normal table gives probability values for the standard normal random variable. The standard normal variable is a continuous random variable with mean and standard deviation . We will denote this variable as . So, we denote .

Standard normal tables (also called -tables) are used to obtain probabilities involving the standard normal variable . We need to first understand what type of probabilities are provided by the given -table. A table may provide probabilities in the form , or it may provide probabilities in the form . If the probability value provided for 0.00 is equal to 0.5, then this is the latter type (i.e., ). If the probability value for 0.00 is equal to 0, then it is the former type (i.e., ). The difference between these two tables is given in the pictures below.

Say that we want to compute using the table. We need to locate the right endpoint 0.54 by finding 0.5 on the left column and 0.04 on the top row.

This leads to the value 0.7054 that represents the proability .

On the other hand, when using the table, we need to first split the region into and , as seen in the pictures below.

Then, we remember that , while can be located on the table as shown below.

So, . Together, we get which is the same value we obtained using the other -table. In general, we can use either -tables to identify probabilities involving the standard normal variable . For the remainder of this explainer, we will use the -table representing probabilities in the format .

Sometimes, the upper endpoint contains an unknown, in which case the probability would be provided. For example, to find that satisfies , we start by locating 0.2673 inside the standard normal table and proceed to locate the edge values leading to .

We use this method to identify that . This means that , where .

Since is a continuous random variable, we remember that for any value of . So, the weak inequality and the strict inequality are interchangeable. For example,

When dealing with the probabilities of a normal distribution, we typically default to weak inequality notations. We should keep in mind that they are equivalent to strict inequalities.

We have observed above that . By the symmetry of the bell curve, we also have .

Symmetry of the normal distribution plays an important role when computing probabilities involving negative values. If an event includes negative values, then we first split the event into the positive and the negative parts. Then, using the bell curve, we can identify a positive event that has the same probability as the negative part. This operation is represented by the progression below.

The pictures above give us the equations where both probabilities on the last line can be obtained using the standard normal table. It is often helpful to think through the pictures before writing down the corresponding equations.

Set differences come handy when computing probabilities. To compute , we observe the following graphs.

So, the area over the interval can be obtained by subtracting the area over the interval from the area over . This leads to

We know , and can be found from the standard normal table. So, we get .

Let us look at an example dealing with the probabilities of standard normal distribution.

Example 3: Calculating Probability for an Interval of Finite Length for Standard Normal Random Variables

Let be a standard normal random variable. Calculate .

Answer

We note that the given interval for includes negative values. Let us begin by thinking through the process using pictures.

In equations, this is

Now, both probabilities in the last line can be found using the standard normal table.

From the picture above, we get and . Finding the sum of the probabilities we get

So, the probability is 0.6955.

To compute probabilities for general normal variables (in other words, those that are not yet in the form ), we need to first relate them to the standard normal variable . This process is known as standardizing the normal distribution. If is a normal random variable with mean and standard deviation , then is the standard normal random variable with mean 0 and standard deviation 1.

How To: Standardizing the Normal Distribution to Compute the Probability

Let be a normal random variable with mean and standard deviation . To compute the probability given by we need to

subtract from all sides: ,
divide by from all sides: ,
replace the middle expression with ,
use the standard normal table to obtain the probability involving .

Below, we provide a brief justification for this transformation.

If is normally distributed with mean and standard deviation , then

Then, by linearity of expected value, the right side is equal to which is equal to zero since . So, as we stated earlier, . Now let us examine the standard deviation which equals 1 since . So, the standard deviation of is equal to . Finally, we remark that the normality of is inherited from the normality of since the transformation is linear. In conclusion, the variable is normally distributed with mean 0 and standard deviation 1.

Let us examine a few examples to familiarize ourselves with different contexts.

Example 4: Determining Probabilities for Normal Distribution given the Mean and the Variance

Let be a random variable that is normally distributed with mean 63 and variance 144. Determine .

Answer

We begin this problem by standardizing the normal distribution. We remember that if , then is the standard normal variable .

We are given that . We recall that the standard deviation is equal to the positive square root of the variance, so .

Subtracting from each side of the inequality we get

Next, we divide each side by and replace by , which gives us

Since involves negative values, we use the symmetry of the bell curve to identify an equivalent positive region.

So, we need to compute . Using the bell curve, we note that

Using the standard normal table, we get and . Then,

So, .

Example 5: Determining Probabilities for Normal Distribution given the Mean and the Standard Deviation

Let be a random variable that is normally distributed with mean 68 and standard deviation 3. Determine .

Answer

We begin by standardizing the normal distribution. We remember that if , then is the standard normal variable. We are given .

We first subtract from each side of the inequality, then we divide each side by , which gives us

Recall that the standardized random variable can be replaced by the standard random variable . Then, the probability above is the same as .

We draw pictures of the bell curve to think through this probability.

This leads to

Using the standard normal table, we get , so

So, is 0.9821.

In our final two examples, we will demonstrate how to use the probability to calculate missing values.

Example 6: Using Probabilities from Normal Distribution to Evaluate an Unknown

Let be a random variable that is normally distributed with mean and standard deviation . Given that , find .

Answer

We are given that , and we are also given that . If we standardize the normal distribution, we get

If we denote , then we need to find that satisfies . Since 0.9938 is larger than 0.5, it must be the case that is a negative value.

Using the illustration above, we obtain the following equation:

Since we know that and , we get

From the standard normal table, we get . So, or, equivalently, .

Recall that we defined , so

Solving this equation for , we get .

Example 7: Calculating the Mean of Normally Distributed Random Variables

Suppose is normally distributed with mean and variance 196. Given that , find the value of .

Answer

We note that the mean is an unknown parameter. Since we are given that is normally distributed with mean and variance 196, we can write . We recall that standard deviation is the square root of variance, so . Standardizing the normal distribution with these values, we find

First, we define . Then, satisfies . Since 0.0668 is smaller than 0.5, should be a negative value. Using the symmetry of the bell curve, we deduce the following equation:

We know that and . So,

From the standard normal table, we find , so . Since we defined therefore Solving this equation for , we get .

Key Points

The empirical rule states that if , then
- approximately of the data lies within from ,
- approximately of the data lies within from ,
- approximately of the data lies within from .
The standard normal varriable is denoted as .
By symmetry, and .
A standard normal table may contain probabilities in the form or in the form .
If the range of values includes negative and positive values, then we first split the event into the negative and the positive parts.
Draw bell curves and use their symmetry to compute probabilities that are not in the form .
If , then we first standardize the distribution by defining .