Lesson Explainer: Stratified Random Sampling | Nagwa Lesson Explainer: Stratified Random Sampling | Nagwa

# Lesson Explainer: Stratified Random Sampling Mathematics

In this explainer, we will learn how to take a stratified, or layered, random sample.

In general, a data set consists of observations, or measurements, from members of a population, or a sample of the population, in relation to a variable or variables of interest.

Our aim in collecting data is to gain information about the population, and we have various statistical methods at our disposal to do this. However, for our results and conclusions to be as accurate and representative as possible, how we collect the data is itself an important part of statistical methodology.

In some instances, it may be possible to collect data on an entire population. For example, if we want to analyze the tennis form of the top 100 tennis players in the world in a particular year, we could actually collect data on all professional matches played by the top 100 players for that year.

Suppose, however, that we would like to analyze certain characteristics, such as mass, diameter, and bounce height, of the tennis balls used in professional tournaments in a particular year.

It would be neither sensible nor feasible to try and collect data on the whole population of tennis balls used that year. Instead, we might take a sample or samples and collect measurements on the balls in those samples. From the sample data, using statistical methods, we may draw conclusions about the population of tennis balls.

When sampling data, our aim is to always try and take a representative sample, that is, a sample that accurately represents or reflects the population from which it is taken. Another term for this is unbiased sample, where no part of a population is over- or underrepresented.

There are a number of sampling methods we can use to collect data, one of which is called random sampling.

### Definition: Random Sampling and Simple Random Sampling

A random sample is a subset of elements selected from a population such that each member of the population has some chance of being selected.

A simple random sample is one in which each member of the population has an equal chance of selection.

Now, it is often the case that a population contains natural, nonoverlapping subdivisions, or strata. In such cases, we might use random sampling to collect data within each stratum and collate the data into a sample representing the whole population.

For example, the population of professional tennis players consists of both male and female players. If the proportions of male and female players are not equal, this difference should be reflected within any sample we take. If it is not, and the sample is taken directly from the population as a whole, then the groups, male and female, may not be represented proportionately within the sample. We can remedy this by taking random samples of a proportionate number of male and female players, which we then combine to form the overall sample.

This process is called stratified, or layered, random sampling and is defined as follows.

### Definition: Stratified or Layered Random Sampling

Stratified or layered random sampling is a sampling method used when a population may be naturally subdivided into distinct, nonoverlapping smaller groups, or strata.

Random samples are taken from each individual stratum and combined to form an overall sample. The size of the random sample from each stratum reflects the size of that stratum within the population. Hence, the strata are represented in the final sample in the same proportions as they are within the population.

For a population of elements, and an overall sample size of , we use the following formula to calculate the sample size, , for a single stratum containing elements:

Alternatively, if we know the percentage of the total population, , that belong to a single stratum, the sample size for that stratum is given by .

As an example, suppose that of registered professional tennis players are male and are female. If we wish to take a small representative sample of, say, 10 from the population of professional tennis players, our sample should consist of

If we are instead given that out of a population of 80 professional tennis players, 48 are male and 32 are female, using the formula for sample size for the two strata, we have

Let’s look at some examples where we examine our understanding of the definition of stratified random sampling.

### Example 1: Determining If a Sampling Scenario Is Stratified Random Sampling

In a certain survey about the colleges that some high school students wish to join, a sample of 2‎ ‎000 students was randomly selected out of a population of 40‎ ‎000. Is that considered to be stratified sampling?

Stratified, or layered, random sampling is used when a population naturally subdivides into groups, or strata. Such a sample reflects the proportions of each stratum within the population. This is achieved by taking random samples from each stratum proportional to the size of the individual stratum within the population as a whole.

In this example, the population consists of 40‎ ‎000 students. We do not have any information on whether or not the population was subdivided into strata, so we must assume that the random sample of 2‎ ‎000 students was selected directly from the population. Therefore, this is not considered to be stratified sampling.

The result in the example above is useful in the context of our next question, where we examine the definition of stratified random sampling.

### Example 2: Stratified Random Sampling

Which of the following is not true about stratified sampling?

1. Stratified random sampling is also called proportional random sampling.
2. Stratified random sampling allows researchers to obtain a sample population that best represents the entire population being studied.
3. Stratified sampling is the random selection of data from an entire population.
4. Stratified random sampling is a method of sampling that involves the division of a population into smaller subgroups known as strata.
5. The stratified random sample is a statistical measurement tool.

We recall that stratified, or layered, random sampling is a sampling method used when a population may be naturally subdivided into distinct, nonoverlapping smaller groups, or strata.

Random samples are taken from each individual stratum and combined to form an overall sample. The size of the random sample taken from each stratum reflects the size of that stratum within the population.

Let’s now see whether each of the given options fits with this definition.

1. Stratified random sampling is also called proportional random sampling. (True or False?)
In stratified random sampling, the population of interest is split into groups or strata. The size of the sample taken from each stratum reflects the proportion of the population represented by that stratum. Therefore, it would not be incorrect to give stratified random sampling an alternate name such as proportional random sampling
2. Stratified random sampling allows researchers to obtain a sample population that best represents the entire population being studied. (True or False?)
We use stratified random sampling when the population can be split into nonoverlapping groups or strata. The proportions of these groups within the population are calculated, and the same proportions are applied to the random samples taken from each group. This means that the different groups are represented proportionally within the final combined sample. Hence, no group should be either over- or underrepresented, and the sample reflects the proportional makeup of the whole population. Such a sample will best represent the entire population being studied. Hence, statement B is true about stratified random sampling.
3. Stratified sampling is the random selection of data from an entire population. (True or False?)
By definition, a stratified random sample is one that combines a number of individual samples taken from distinct groups within the population. The size of the sample from each group reflects the proportion of that group, or stratum, within the population. The data is, therefore, not randomly selected from an entire population. Hence, this statement about stratified sampling is false.
4. Stratified random sampling is a method of sampling that involves the division of a population into smaller subgroups known as strata. (True or False?)
By definition, stratified random sampling involves the population being divided into smaller subgroups. These smaller groups are known as strata and the size of the sample from each group reflects the size of that group within the population. Hence, this statement is true about stratified sampling.
5. The stratified random sample is a statistical measurement tool. (True or False?)
A stratified random sample reflects the proportions of the distinct subgroups, or strata, within a population. Measuring the population, and hence, a sample, in this way, we are maintaining the proportions inherent within the population so that statistical results and predictions gained from the sample data reflect the true makeup of the population. By this token, the stratified random sample is a statistical measurement tool. Hence, this statement is true about stratified sampling.

Hence, we find that only statement C is not true about stratified sampling.

In our next example, we calculate the sample size for a stratum within a population.

### Example 3: Calculating the Sample Size of a Stratum given the Proportion of Sample Needed

In an HR study about the salaries in a certain company with 1‎ ‎000 employees, the employees were divided into males and females. If the total percentage of females in the company was 60 percent and a sample of 40 people was selected, what was the number of males in the sample?

Since the population, that is, the employees in the company, naturally subdivides into two strata, male and female, we use stratified, or layered, random sampling as the sampling method. This means that the sample reflects the proportions of male and female employees within the company.

Since 60 percent of the employees were female, 60 percent of the sample must also have been female. This means that the remainder, that is, percent, of the sample must have been male. We are told that the sample consisted of 40 people. Hence, 40 percent of those 40 people must have been male. That is,

### Example 4: Calculating the Sample Size of a Stratum given the Stratum and Population Sizes

Adel needs to conduct a study to determine whether the students in his school like playing football. He decides to divide the students into two groups, boys and girls, knowing that the school has a total of 200 students, 80 of whom are girls.

If Adel decides that his sample size will be 50, how many girls should he select for the study?

Since the population of students is split into 2 distinct strata, that is, boys and girls, the appropriate sampling method is stratified, or layered, random sampling.

A stratified random sample is one that combines a number of separate random samples taken from distinct groups within the population. The sample size for each group reflects the proportion of that group, or stratum, within the population.

Applying this to our population of students, 80 out of 200 students are girls. Therefore, the proportion of girls is , which as a percentage is .

This means that to reflect the proportions of boys and girls in the population, of Adel’s sample should be girls. Adel’s sample size is 50 students and of 50 is

Hence, Adel should select 20 girls for the study.

Note that we could have reached this conclusion in a slightly different way, using a formula for strata sample size. That is, for a population of elements and an overall sample size of , the sample size, , for a single stratum containing elements is

In our case, , , and so

In our next example, we apply stratified, or layered, random sampling to a population that has been divided into 3 groups.

### Example 5: Sample Size of a Stratum given the Sample Size of Other Strata and the Population Size

A scientist decides to conduct a survey on the effects of a certain medicine in a city of 100‎ ‎000 people. He divides them into three groups based on their region: city center, outer city, and suburbs. There are 10‎ ‎000 people in the suburbs and 30‎ ‎000 people in the outer city. If the scientist decides to take a sample of 1‎ ‎000 people, how many people from the suburbs should be included?

Since the city is divided into three distinct groups, or strata, an appropriate sampling method is stratified, or layered, random sampling.

We recall that a stratified random sample is one that combines a number of separate random samples, taken from distinct groups within the population. The size of the sample from each group reflects the proportion of that group, or stratum, within the population.

In our case, we know the total population and the number of people in the suburbs and outer city, but not the city center:

Although we do not need to know the number of people in the city center in order to answer the question, we note that there must be people in the city center.

The scientist wishes to take a representative sample of 1‎ ‎000 people from the population, and we are asked how many of these should be selected from the suburbs. Applying stratified random sampling, the proportion of people from the suburbs in the sample must be the same as the proportion of people from the suburbs in the whole population. There are 10‎ ‎000 people in the suburbs, and as a proportion of the total population, this is

As a percentage, that is . Hence, of the sample should be people from the suburbs. If the sample size is 1‎ ‎000 people, then of this is

Therefore, 100 people from the suburbs should be included in the sample.

Note that we could have reached this conclusion in a slightly different way, using a formula for strata sample size. That is, for a population of elements and an overall sample size of , the sample size, , for a single stratum containing elements is

In our case, , , and so

Related to stratified, or layered, random sampling is a method of random sampling used in estimating population sizes known as the capture–recapture method. Let’s look at an example.

Suppose that as part of a large rehousing project, a cat rescue center wishes to estimate the population of stray cats within a particular urban area.

On one day, 20 stray cats are captured, tagged, and released. The next day, 12 cats are captured, 4 of which are found to be tagged. As a proportion, ; that is, one-third, or approximately , of the cats captured on day 2 had tags.

We can assume that the same proportion of cats were tagged from the whole population. Hence, we estimate that of the population comprised 20 cats. If this is one-third of the population, then the total population is three times this. That is, cats.

### Definition: The Capture–Recapture Method for Estimating Population Size

Equating capture with random selection from a population to estimate the population size, , let be the number of the population that are initially captured, tagged, and then released.

If is the number of members of the population that are subsequently captured and is the number of those that are found to be tagged, then the overall population size is given by

In our stray cats example above, we have , , and . Hence,

We can define this method alternatively as follows.

### Example 6: Using the Capture–Recapture Method to Estimate the Size of a Population

In an HR study about the salaries in a certain company, the employees are divided into males and females. The total percentage of females in the company is 60 percent. A sample of 10 employees is selected from the company. The males in that sample represent 5 percent of the males in the company. What is the total number of employees in that company?

To begin with, we note that 60 percent of employees in the company are female and that the employees are divided into males and females. This means that percent of employees must be male. If we let be the total number of employees in the company, then the number of male employees is 40 percent of , that is, , or .

To find the total number of employees, , we use the capture–recapture formula. This tells us that the population size where is the number initially captured, tagged, and then released; is the number subsequently captured; and is the number of those found to be tagged.

In our case, identifying “all male employees” as those “captured, tagged, and released,” we have .

From the question, we know that our sample size, that is, the number subsequently “captured,” , is equal to 10. Further, the males in this subsequent sample represent 5 percent of the males in the company. This means that

Hence, we have

Substituting these values into the capture–recapture formula for population size then gives us

Hence, the total number of employees in the company is 200.

We complete this explainer by summarizing some of the key points.

### Key Points

• A random sample is a subset of elements selected from a population, such that each member of the population has some chance of selection. A simple random sample is a sample in which each member of the population has an equal chance of selection.
• A stratified or layered random sample is a sampling method used when a population may be subdivided into smaller distinct groups or strata. Random samples are taken from each stratum, the sizes of which are in the same proportion as those of each stratum within the population. These smaller samples are then combined to form a representative sample of the whole population.
• For a population of elements and an overall sample size of , we use the following formula to calculate the sample size, , for a single stratum containing elements: Alternatively, if we know the percentage of the total population, , that belong to a single stratum, the sample size for that stratum is given by .
• The capture–recapture method is a proportional sampling method used to estimate overall population size, , such that Here, is the number of population members initially captured, tagged, and released; is the number of population members subsequently captured; and is the number of those found to have been tagged.