In this lesson, we’ll learn how to take a stratified, sometimes called a layered, random sample. Remember that when we collect data, our aim is to use the data to find things out or infer things about the population. For our results to be accurate and representative of the population, we need to be very careful that our data collection is also accurate and representative.
Now, more often than not, it’s neither feasible nor possible to collect data on a whole population. For example, suppose we’re studying the fish in a particular lake. To take measurements from all of the fish, that is, the whole population, we’d have to catch them all first. And that’s unlikely to be feasible for us, and it certainly isn’t a good idea for the fish. What we can do instead is to select a sample of fish. We then take measurements or note characteristics of the fish from those in our sample and use statistical methods to gain information and infer results about the population from the sample data.
Stratified random sampling is a method of selecting a representative sample from a population that’s subdivided into distinct groups or strata. When our population can be subdivided into nonoverlapping groups or strata, to form a sample for the population, we take a random sample from each stratum, which we then combine into a sample for the population. The size of the random sample for each stratum reflects the size of that stratum within the population. This means that the strata are represented in the final sample in the same proportions as they are within the population. So given a population size of uppercase 𝑁 and an individual stratum size of uppercase 𝑆, if our required population sample size is lowercase 𝑛, then our individual stratum sample size is lowercase 𝑠. And that’s given by the stratum size 𝑆 over the population size 𝑁 multiplied by the required sample size lowercase 𝑛.
Alternatively, if we know that a stratum is a certain percentage of the population — let’s call that 𝜌 — then the sample size for the individual stratum is 𝜌 percent times 𝑛, which is the required sample size. The important thing is that the sample sizes for the strata are in the same proportions as the strata are within the population. Let’s apply this in an example.
Suppose there are 48 male and 32 female tennis players registered at the local club. In a representative sample of 10, how many would be male and how many female?
Since our population of tennis players is split into two strata, that’s male and female, and we have 48 male players and 32 female players, our population size 𝑁 is 48 plus 32, which is equal to 80. We’re going to use stratified random sampling, where lowercase 𝑛 is the overall sample size, which in our case is 10. Our population size, uppercase 𝑁, is 80. And the sample size for each stratum, lowercase 𝑠, is given by the size of the individual stratum, uppercase 𝑆, divided by the population size, uppercase 𝑁, multiplied by the sample size, lowercase 𝑛. This means that the sample size for the male players is 48 divided by 80 multiplied by 10. Dividing both the numerator and the denominator by 10 and then the numerator and denominator by eight, this gives us six male players in the sample.
Now, performing the same calculation for the female players, we have 32 female players divided by 80 in the whole population multiplied by 10 in the sample. And again, dividing numerator and denominator by 10 and the numerator and denominator by eight, we have four female players in our sample. What this means is that we take a random sample of six male players from our total of 48 male players, a random sample of four female players from our total of 32 female players. And this makes up our total random sample of 10 from the population.
Now let’s look at how this works if we’re given percentages.
Suppose that 60 percent of tennis club players are male and 40 percent are female. In a random sample of 10 players, how many male and how many female players would we select?
Our population of tennis club members is split into strata. The strata are female and male. We know that 60 percent of the population are male and 40 percent of the population are female and that our required sample size is 10. That’s lowercase 𝑛. To work out how many of these should be male and how many female, we use the formula lowercase 𝑠, which is the sample size for each stratum, is 𝜌 percentage times 𝑛, where 𝜌 is the percentage of each stratum.
If we begin with the number of male players that we need for our sample, that is, 𝑠 subscript 𝑀, this is 60 percent of 10, that is, 60 divided by 100 multiplied by 10. And dividing both numerator and denominator by 10 — and we can do this once more — this gives us 𝑠 𝑀 is the number of male players in our sample, and that is six.
So now we can do the same calculation for the female players where there are 40 percent female. And the number of female players in our sample should be 40 divided by 100 multiplied by 10. And we have four female players within our sample of 10. And so in our stratified sample of 10 tennis club members, a representative sample consists of six male and four female players.
Now let’s look at an example where we examine our understanding of the definition of stratified random sampling.
In a certain survey about the colleges that some high-school students wish to join, a sample of 2,000 students was randomly selected out of a population of 40,000. Is that considered to be stratified sampling?
To answer this question, let’s remind ourselves of what we mean by stratified or layered random sampling. This is a sampling method we use when the population consists of nonoverlapping subdivisions or strata. To select a stratified random sample from a population, we take random samples from each stratum proportional to the size of that stratum within the population. In this example, we’re told that our population size is 40,000 and that a sample of 2,000 students was randomly selected. We have no information on whether or not the population was subdivided into strata. And so we must assume that the random sample of 2,000 students was selected directly from the population where there were no strata involved. We cannot then say that this involves stratified sampling. Our answer must then be no, this is not stratified sampling.
Now let’s look at an example where we calculate the sample size for a stratum.
In an HR study about the salaries in a certain company with 1,000 employees, the employees were divided into males and females. If the total percentage of females in the company was 60 percent and a sample of 40 people was selected, what was the number of males in the sample?
Since the population, that is, the employees in the company, naturally subdivides into two strata, that is, males and females, we use stratified or layered random sampling as our sampling method. This means that any sample should reflect the proportions of the strata within the population. We’re told that 60 percent of the employees were female. A sample of 40 people were selected. And so 60 percent of those 40 must be female.
Since the employees are split into two distinct strata, male and female, if 60 percent are female, then 100 minus 60 percent must be male. That is, 40 percent of the employees must be male. This in turn means that 40 percent of the sample must also be male. And in order to calculate the number of males in our sample, we use the formula 𝑠 is equal to 𝜌 percent times 𝑛, where lowercase 𝑠 is the stratum sample size. 𝜌 is the stratum percentage of population. And 𝑛 is the overall sample size. In our case then, the stratum size for males is 40 percent times 40, which is the sample size, that is, 40 divided by 100 times 40.
Now we can divide both the numerator and the denominator by 10. And once more, dividing numerator and denominator by 10, we have in our numerator four multiplied by four, which is 16. The number of males therefore in the sample is 16.
Let’s look at another example.
Ethan needs to conduct a study to determine whether the students in his school like playing football. He decides to divide the students into two groups, boys and girls, knowing that the school has a total of 200 students, 80 of whom are girls. If Ethan decides that his sample size will be 50, how many girls should he select for the study?
Since the population of students is split into two distinct strata, that is, boys and girls, the appropriate sampling method is stratified or layered random sampling. A stratified random sample is a sample consisting of random samples selected from distinct groups or strata within the population. The sample size for each stratum reflects the stratum proportion of the population.
In order to calculate the sample size for a particular stratum, that is, lowercase 𝑠, we use the formula lowercase 𝑠 is equal to uppercase 𝑆, which is the number in the stratum, divided by the number in the population, uppercase 𝑁, multiplied by lowercase 𝑛, which is the overall sample size. In our case, we have a total of 200 students so that uppercase 𝑁 is 200. We know that we have 80 girls so that uppercase 𝑆 is equal to 80 and that Ethan’s sample size is 50. That is, lowercase 𝑛 is equal to 50. And our sample size for girls is 80 over 200 multiplied by 50, that is, 80 girls divided by a population of 200 multiplied by the sample size 50. We can divide numerator and denominator by 50 and again numerator and denominator by four, which gives us 20. Therefore, Ethan should select 20 girls for his study.
In our next example, we’re going to apply stratified or layered random sampling to a population that’s been divided into three subgroups.
A scientist decides to conduct a survey on the effect of a certain medicine in a city of 100,000 people. He divides them into three groups based on their region: city center, outer city, and suburbs. There are 10,000 people in the suburbs and 30,000 people in the outer city. If the scientist decides to take a sample of 1,000 people, how many people from the suburbs should be included?
Since the city is divided into three distinct groups or strata, an appropriate sampling method is stratified or layered random sampling. Recall that a stratified random sample is one which combines a number of separate random samples taken from distinct groups within the population. The size of the sample from each group reflects the proportion of that group or stratum within the population.
In order to calculate the sample size for each stratum, we use the formula lowercase 𝑠, which is the individual stratum sample size, is equal to uppercase 𝑆, which is the stratum size, divided by uppercase 𝑁, which is the population size, multiplied by lowercase 𝑛, which is the overall sample size. In our case, our population size is 100,000. That’s uppercase 𝑁. We’re interested in how many people from the suburbs should be in our sample. And we’re told that there are 10,000 people in the suburbs. So uppercase 𝑆 is equal to 10,000. Our overall sample size from the population is 1,000 people so that lowercase 𝑛 is 1,000. Into our formula then, the sample size for the suburbs is 10,000 divided by 100,000 multiplied by 1,000, that is, the stratum size divided by the population size multiplied by the overall sample size.
We can divide the numerator and the denominator by 1,000 and then again by 100. And we have the sample size of people from the suburbs is 100. So for a sample of 1,000 people, 100 of those should be from the suburbs.
Let’s complete this lesson by reminding ourselves of some of the key points about stratified random sampling. Stratified random sampling is a sampling method used when the population can be divided into distinct groups or strata. A representative sample combines random samples, one from each stratum, where the sample size reflects the proportion of the stratum within the population. For a population of 𝑁 elements and overall sample size of lowercase 𝑛, a sample size 𝑠 for each stratum is lowercase 𝑠 is equal to uppercase 𝑆 over uppercase 𝑁 multiplied by lowercase 𝑛, where uppercase 𝑆 is the number of elements in the individual stratum. Alternatively, if we’re given the percentage of a population in an individual stratum, that is, 𝜌 percent, the sample size for that individual stratum, which is lowercase 𝑠, is 𝜌 percent multiplied by 𝑛, which is the overall sample size.