In this lesson, we’ll learn how to
take a stratified, sometimes called a layered, random sample. Remember that when we collect data,
our aim is to use the data to find things out or infer things about the
population. For our results to be accurate and
representative of the population, we need to be very careful that our data
collection is also accurate and representative.
Now, more often than not, it’s
neither feasible nor possible to collect data on a whole population. For example, suppose we’re studying
the fish in a particular lake. To take measurements from all of
the fish, that is, the whole population, we’d have to catch them all first. And that’s unlikely to be feasible
for us, and it certainly isn’t a good idea for the fish. What we can do instead is to select
a sample of fish. We then take measurements or note
characteristics of the fish from those in our sample and use statistical methods to
gain information and infer results about the population from the sample data.
Stratified random sampling is a
method of selecting a representative sample from a population that’s subdivided into
distinct groups or strata. When our population can be
subdivided into nonoverlapping groups or strata, to form a sample for the
population, we take a random sample from each stratum, which we then combine into a
sample for the population. The size of the random sample for
each stratum reflects the size of that stratum within the population. This means that the strata are
represented in the final sample in the same proportions as they are within the
population. So given a population size of
uppercase 𝑁 and an individual stratum size of uppercase 𝑆, if our required
population sample size is lowercase 𝑛, then our individual stratum sample size is
lowercase 𝑠. And that’s given by the stratum
size 𝑆 over the population size 𝑁 multiplied by the required sample size lowercase
Alternatively, if we know that a
stratum is a certain percentage of the population — let’s call that 𝜌 — then the
sample size for the individual stratum is 𝜌 percent times 𝑛, which is the required
sample size. The important thing is that the
sample sizes for the strata are in the same proportions as the strata are within the
population. Let’s apply this in an example.
Suppose there are 48 male and 32
female tennis players registered at the local club. In a representative sample of 10,
how many would be male and how many female?
Since our population of tennis
players is split into two strata, that’s male and female, and we have 48 male
players and 32 female players, our population size 𝑁 is 48 plus 32, which is equal
to 80. We’re going to use stratified
random sampling, where lowercase 𝑛 is the overall sample size, which in our case is
10. Our population size, uppercase 𝑁,
is 80. And the sample size for each
stratum, lowercase 𝑠, is given by the size of the individual stratum, uppercase 𝑆,
divided by the population size, uppercase 𝑁, multiplied by the sample size,
lowercase 𝑛. This means that the sample size for
the male players is 48 divided by 80 multiplied by 10. Dividing both the numerator and the
denominator by 10 and then the numerator and denominator by eight, this gives us six
male players in the sample.
Now, performing the same
calculation for the female players, we have 32 female players divided by 80 in the
whole population multiplied by 10 in the sample. And again, dividing numerator and
denominator by 10 and the numerator and denominator by eight, we have four female
players in our sample. What this means is that we take a
random sample of six male players from our total of 48 male players, a random sample
of four female players from our total of 32 female players. And this makes up our total random
sample of 10 from the population.
Now let’s look at how this works if
we’re given percentages.
Suppose that 60 percent of tennis
club players are male and 40 percent are female. In a random sample of 10 players,
how many male and how many female players would we select?
Our population of tennis club
members is split into strata. The strata are female and male. We know that 60 percent of the
population are male and 40 percent of the population are female and that our
required sample size is 10. That’s lowercase 𝑛. To work out how many of these
should be male and how many female, we use the formula lowercase 𝑠, which is the
sample size for each stratum, is 𝜌 percentage times 𝑛, where 𝜌 is the percentage
of each stratum.
If we begin with the number of male
players that we need for our sample, that is, 𝑠 subscript 𝑀, this is 60 percent of
10, that is, 60 divided by 100 multiplied by 10. And dividing both numerator and
denominator by 10 — and we can do this once more — this gives us 𝑠 𝑀 is the number
of male players in our sample, and that is six.
So now we can do the same
calculation for the female players where there are 40 percent female. And the number of female players in
our sample should be 40 divided by 100 multiplied by 10. And we have four female players
within our sample of 10. And so in our stratified sample of
10 tennis club members, a representative sample consists of six male and four female
Now let’s look at an example where
we examine our understanding of the definition of stratified random sampling.
In a certain survey about the
colleges that some high-school students wish to join, a sample of 2,000 students was
randomly selected out of a population of 40,000. Is that considered to be stratified
To answer this question, let’s
remind ourselves of what we mean by stratified or layered random sampling. This is a sampling method we use
when the population consists of nonoverlapping subdivisions or strata. To select a stratified random
sample from a population, we take random samples from each stratum proportional to
the size of that stratum within the population. In this example, we’re told that
our population size is 40,000 and that a sample of 2,000 students was randomly
selected. We have no information on whether
or not the population was subdivided into strata. And so we must assume that the
random sample of 2,000 students was selected directly from the population where
there were no strata involved. We cannot then say that this
involves stratified sampling. Our answer must then be no, this is
not stratified sampling.
Now let’s look at an example where
we calculate the sample size for a stratum.
In an HR study about the salaries
in a certain company with 1,000 employees, the employees were divided into males and
females. If the total percentage of females
in the company was 60 percent and a sample of 40 people was selected, what was the
number of males in the sample?
Since the population, that is, the
employees in the company, naturally subdivides into two strata, that is, males and
females, we use stratified or layered random sampling as our sampling method. This means that any sample should
reflect the proportions of the strata within the population. We’re told that 60 percent of the
employees were female. A sample of 40 people were
selected. And so 60 percent of those 40 must
Since the employees are split into
two distinct strata, male and female, if 60 percent are female, then 100 minus 60
percent must be male. That is, 40 percent of the
employees must be male. This in turn means that 40 percent
of the sample must also be male. And in order to calculate the
number of males in our sample, we use the formula 𝑠 is equal to 𝜌 percent times
𝑛, where lowercase 𝑠 is the stratum sample size. 𝜌 is the stratum percentage of
population. And 𝑛 is the overall sample
size. In our case then, the stratum size
for males is 40 percent times 40, which is the sample size, that is, 40 divided by
100 times 40.
Now we can divide both the
numerator and the denominator by 10. And once more, dividing numerator
and denominator by 10, we have in our numerator four multiplied by four, which is
16. The number of males therefore in
the sample is 16.
Let’s look at another example.
Ethan needs to conduct a study to
determine whether the students in his school like playing football. He decides to divide the students
into two groups, boys and girls, knowing that the school has a total of 200
students, 80 of whom are girls. If Ethan decides that his sample
size will be 50, how many girls should he select for the study?
Since the population of students is
split into two distinct strata, that is, boys and girls, the appropriate sampling
method is stratified or layered random sampling. A stratified random sample is a
sample consisting of random samples selected from distinct groups or strata within
the population. The sample size for each stratum
reflects the stratum proportion of the population.
In order to calculate the sample
size for a particular stratum, that is, lowercase 𝑠, we use the formula lowercase
𝑠 is equal to uppercase 𝑆, which is the number in the stratum, divided by the
number in the population, uppercase 𝑁, multiplied by lowercase 𝑛, which is the
overall sample size. In our case, we have a total of 200
students so that uppercase 𝑁 is 200. We know that we have 80 girls so
that uppercase 𝑆 is equal to 80 and that Ethan’s sample size is 50. That is, lowercase 𝑛 is equal to
50. And our sample size for girls is 80
over 200 multiplied by 50, that is, 80 girls divided by a population of 200
multiplied by the sample size 50. We can divide numerator and
denominator by 50 and again numerator and denominator by four, which gives us
20. Therefore, Ethan should select 20
girls for his study.
In our next example, we’re going to
apply stratified or layered random sampling to a population that’s been divided into
A scientist decides to conduct a
survey on the effect of a certain medicine in a city of 100,000 people. He divides them into three groups
based on their region: city center, outer city, and suburbs. There are 10,000 people in the
suburbs and 30,000 people in the outer city. If the scientist decides to take a
sample of 1,000 people, how many people from the suburbs should be included?
Since the city is divided into
three distinct groups or strata, an appropriate sampling method is stratified or
layered random sampling. Recall that a stratified random
sample is one which combines a number of separate random samples taken from distinct
groups within the population. The size of the sample from each
group reflects the proportion of that group or stratum within the population.
In order to calculate the sample
size for each stratum, we use the formula lowercase 𝑠, which is the individual
stratum sample size, is equal to uppercase 𝑆, which is the stratum size, divided by
uppercase 𝑁, which is the population size, multiplied by lowercase 𝑛, which is the
overall sample size. In our case, our population size is
100,000. That’s uppercase 𝑁. We’re interested in how many people
from the suburbs should be in our sample. And we’re told that there are
10,000 people in the suburbs. So uppercase 𝑆 is equal to
10,000. Our overall sample size from the
population is 1,000 people so that lowercase 𝑛 is 1,000. Into our formula then, the sample
size for the suburbs is 10,000 divided by 100,000 multiplied by 1,000, that is, the
stratum size divided by the population size multiplied by the overall sample
We can divide the numerator and the
denominator by 1,000 and then again by 100. And we have the sample size of
people from the suburbs is 100. So for a sample of 1,000 people,
100 of those should be from the suburbs.
Let’s complete this lesson by
reminding ourselves of some of the key points about stratified random sampling. Stratified random sampling is a
sampling method used when the population can be divided into distinct groups or
strata. A representative sample combines
random samples, one from each stratum, where the sample size reflects the proportion
of the stratum within the population. For a population of 𝑁 elements and
overall sample size of lowercase 𝑛, a sample size 𝑠 for each stratum is lowercase
𝑠 is equal to uppercase 𝑆 over uppercase 𝑁 multiplied by lowercase 𝑛, where
uppercase 𝑆 is the number of elements in the individual stratum. Alternatively, if we’re given the
percentage of a population in an individual stratum, that is, 𝜌 percent, the sample
size for that individual stratum, which is lowercase 𝑠, is 𝜌 percent multiplied by
𝑛, which is the overall sample size.