In this explainer, we will learn how to choose a simple random sample from a population.
In statistics, we are often interested in analyzing information. For example, we may be interested in knowing how many students prefer salad to soup to help determine a lunch menu, or we may want to know the spending habits of customers at a store to determine the price of items.
In each of these cases, it is very difficult to collect information from everyone. For example, it would be difficult to ask every student at a large school about their preferred meals. We can get around this idea by asking only a fraction of the students and using this smaller group to gain an idea of the preferences of the larger group; this smaller group is called a sample.
In general, the entire set of people, elements, or objects we are analyzing is called the population, and any smaller subset of this is known as a sample of the population.
Definition: Population and Sample
The entire set of objects we are analyzing is called the population.
A smaller subset or selection of the population is called a sample of the population, and we call the size of this set the sample size.
Samples are a great way of obtaining information quickly. For example, it is much easier to ask 20 students for their lunch preferences than the whole school. However, sampling does have drawbacks.
For one, we may come to the wrong conclusion from only looking at the sample. Consider the lunch example; if all 20 students we asked preferred soup, we may conclude that soup is the correct choice; however, it could also be possible that we asked the only 20 students with this opinion.
Another similar problem could be caused by bias in the choice of the sample group. An example for this could be by questioning the first 20 students in the lunch queue. At first, it may seem like this is a fair way to choose the students, but this could lead to bias. One way this could lead to bias is if soup is made at the start of the lunch break and gets colder over time, while the salad is always fresh. In this case, the students who prefer soup may want to queue up earlier so they can get it while it is still hot, while students who prefer salad will have no preference. Therefore, the sample is more likely to have students who prefer soup compared to the overall population.
These two problems are very similar since bias in the sampling can cause the incorrect conclusion to be drawn. These problems both stem from the same source, which is the choice of the sample of the population. We want to choose a sample large enough to get a fair assessment from the population, and we also want to choose the sample in such a way as to minimize bias.
We will focus on how we choose the members of the sample; however, it is worth noting that the larger the sample size we choose, the better the assessment we get at the detriment of needing to collect more data.
There are many different ways we could choose a sample. Let’s say we want to choose a sample of 50 students out of 200 students. One way we could choose the students would be to list them alphabetically by surname and then choose every fourth student. This gives us a sample size of 50. This is called a systematic sample; however, there is a small problem with bias with this sample method. Small families who share the same surname cannot all be chosen, since this method will skip over students with the same surnames.
We can improve on this sample method by instead choosing the 50 students randomly. We call a sample where all of the members have an equal chance of being chosen a simple random sample or even just a random sample. One way of constructing a simple random sample would be to label the students from 1–200 and then use a random number generator to choose 50 numbers. Since every member of the population has an equal chance of being chosen, a random sample is the best at removing possible bias.
Definition: Simple Random Sample
A sample where every member has an equal probability of being chosen is called a simple random sample.
Another way of looking at this is that any two members must have an equal probability of being chosen for the sample and it must be possible for any group (not bigger than the sample size) to all be chosen for the sample.
Let’s now see an example of determining why a given sample method is not a simple random sample.
Example 1: Identifying Why a Given Statement Does Not Describe a Simple Random Sample
Why does the statement “All the clothing produced by a factory to measure the quality of that factory” not describe a simple random sample?
- Because a sample is always larger than the parent population
- Because a sample has to be part of the whole population and not the population itself
- Because this is a simple but not a random sample
We begin by recalling that a simple random sample is a sample in which every member of the population has an equal probability of being chosen. At first, it might seem like choosing all of the clothes produced by a factory would be a simple random sample, since every item of clothes has probability 1 of being chosen.
However, we need to note that it must be a sample, and we can recall that a sample must be a smaller subset of the population.
Hence, choosing the entire population cannot be a simple random sample because a sample has to be part of the whole population and not the population itself. This is option B.
In our next example, we will determine if a given sampling method gives a simple random sample.
Example 2: Determining If a Given Sample Is a Simple Random Sample
Suppose your school has 500 students and you need to conduct a short survey on the quality of the food served in the cafeteria. You decide that a sample of 10 students should be sufficient for your purposes. So, you choose 10 students by assigning them each a number and then using the random button on your calculator to choose 10 students randomly out of the 500 and conduct the survey on them.
Is that considered a simple random sample?
We start by recalling that a simple random sample is a strict nonempty subset of the population such that every member of the population has an equal chance of being in the subset.
We can see that we are choosing 10 students from 500, so the subset will be strict. We need to determine if this selection is random. We can use the random number button on a calculator to give us a random number between and . If we multiply this number by 1 000, we get a random number between 0 and 999. We can ignore the numbers above 500 to choose 10 students at random. In this method, any two students will have the same probability of being chosen.
Hence, yes, this is a simple random sample.
In our next example, we will determine the percentage size of a sample from the population and sample size.
Example 3: Determining the Percentage Size of a Sample Using the Population Size and Sample Size
A garden consists of 200 trees. We want to take a sample of 20 trees. Express the sample size chosen using percentage.
We first recall that the sample size is the number of members in the sample. In this case, the sample size is 20 since we are taking a sample of 20 trees. This is out of the 200 possible total trees, so we can write this as a percent by dividing by the population size and multiplying by 200.
Hence, the sample size is of the population.
In our next example, we will determine which of four given sampling methods is not a simple random sample.
Example 4: Determining Which of a List of Sample Methods Is Not a Simple Random Sample
A teacher wants to know the favorite subject of the students in his school. Since the school has 300 students, he decides to only ask a sample of the population. For which of the following sampling methods does each student not have an equal probability of being chosen for the sample?
- A sample of 30 students who are chosen by selecting 30 national IDs randomly
- A sample of 30 students selected by writing names of students in small papers, folding the papers, placing the papers in a bowl, and drawing 30 pieces of paper
- A sample of 30 students selected by putting students’ names in a list, giving each name a random number from 1 to 300, and choosing the names that have numbers divisible by 10
- A sample of 100 children who have blue eyes
We start by recalling that a simple random sample is a sample of the population such that every member of the population has an equal chance of being in the sample.
This means we need to determine in which of the four given options does every student have the same probability of being chosen for the sample and which options do not have equal probability for every student.
Let’s start with option A. We see that students are chosen randomly by their national ID. We might be worried that the national IDs of students might not be random themselves, and this is likely correct, since they can be allocated sequentially, meaning twins are likely to have related IDs. However, we choose 30 students by choosing random IDs from this list, meaning every ID has an equal chance of being chosen. So, this is a simple random sample.
We have a similar story in option B. Although there may be some relation in the order that the students are written on the papers, the papers are then shuffled, meaning that they are drawn at random. It is worth noting that we need every student to be included in the bowl, so this is not an easy method to generate a sample, since there are 300 total students. However, it is a simple random sample since all students have the same chance of being selected.
Once again, we have the same story in option C. Every student is randomly given a number from 1 to 300. Since this process is random, we note that every student has an equal chance of being allocated a number divisible by 10. Hence, this is simple random sample.
Finally, in option D we note that we choose 100 children who have blue eyes. This is not a simple random sample since any student with a different eye color has a chance of being selected for the sample.
Hence, only option D, a sample of 100 children who have blue eyes, is not a simple random sample.
In our final example, we will determine which of a list of five given sampling methods is a simple random sample.
Example 5: Determining Which of a List of Sample Methods Is a Simple Random Sample
An actor in a theater wants to choose random people to go up on stage and participate in the play with him. Which choice is considered a random sampling method?
- He chooses those who are taller than 190 cm.
- He chooses women only.
- He chooses those who have seats with numbers that were picked from a bowl full of seat numbers.
- He chooses a third of the sample to be women and two-thirds to be men.
- He chooses those who wear glasses.
A simple random sample takes a smaller group from the overall population such that every member of the population has an equal chance of being chosen for the smaller group, which is called a sample.
Therefore, we can determine which of the given options gives a simple random sample by determining which options
- construct a sample (the sample must have a smaller size than the total population and cannot be empty)
- offer each member an equal chance of being chosen in the sample group?
We can check each option separately. Let’s start with option A. Choosing everyone who is taller than 190 cm does not give everyone an equal chance of being chosen, since anyone shorter than this has a chance of of being in the sample group.
It is worth noting that there are special cases in which the probability of being chosen would be equal for everyone; namely, if everyone in the group is taller than 190 cm or everyone is shorter than 190 cm. However, in either case, the resulting selection is not a proper sample, since either the entire population is being selected, or no one is being selected, neither case of which is a sample.
Thus, this is not a simple random sample.
In option B, we have a similar story. By only choosing women for the sample, anyone who is not a woman has a probability of of being chosen for the sample group; however, the women will have a chance of being chosen.
Once again, we can consider the possibility that the group consists only of women or contains no women. In both cases, we will still not have a simple random sample, since the chosen members would either be the entire population or no one.
Thus, this is not a simple random sample.
In option C, the chosen members are those who have seats with numbers that were picked from a bowl full of seat numbers. If all possible seat numbers are included in the bowl once and the numbers are chosen at random, then we can conclude that every member has an equal chance of being chosen, so this is a simple random sample.
We could stop here, however, for due diligence, let’s check the remaining two options.
In option D, a third of the sample is chosen to be women and two-thirds to be men. In general, this is not a simple random sample since we are forcing the proportions of the sample group. This will affect the possibilities of members being chosen for the sample group. For example, if the group is 4 men and 2 women, then it is not possible to choose both women for the sample.
Thus, this is not a simple random sample.
In option E, only those with glasses are chosen. This means that anyone without glasses has a chance of being chosen; however, those with glasses will have a chance of being chosen. We could also consider the possibilities of everyone in the group wearing glasses or no one wearing glasses. However, in both cases we do not get a simple random sample since this would either choose the whole group or no one.
Thus, this is not a simple random sample.
Hence, only option C, choosing those who have seats with numbers that were picked from a bowl full of seat numbers, is a simple random sample.
Let’s finish by recapping some of the important points from this explainer.
- The entire set of people, elements, or objects we are analyzing is called the population.
- A smaller subset of the population is called a sample of the population, and we call the size of this set the sample size.
- A sample where every member has an equal probability of being chosen is called a simple random sample.
- The larger the sample size, the more accurate the results can be, at the expense of more difficult data collection.