In this explainer, we will learn how to determine when to choose between taking a sample and a census.
In statistics, we are often interested in surveying a population; this involves measuring one or more attributes in order to gather data to analyze and draw conclusions. The population could be a group of people, but in general, it is the set of all objects that are of interest in the survey. For example, a store might want to check the ripeness of its fruits; in this case, the population would be the set of all items of fruits in the store.
If the store owner wanted to determine the percentage of ripe fruits in their stock, then they could check the ripeness of every single item of fruit. This would allow for an accurate measure of the percentage of ripe fruits in their stock; however, it would take time to do this.
Instead, the store owner could check only a fraction of the fruits in their stock; this would be quicker, but of course the end result would not be as accurate. When measuring the entire population, we say that we are taking a census. If we instead only measure a fraction of the population, then we say that we are taking a sample of the population.
Definition: Types of Data Sets (Census and Sample)
The population of a study refers to the entire set of members we want to determine information about.
If a study measures every member of the population, then we call the data set a census, but if only a fraction of the population is measured, then we call it a sample.
In our first example, we will identify the population in a given study.
Example 1: Identifying the Population
To determine the popularity of different meals at a school, the school staff send questionnaires to 100 of the school’s 500 students. What is the population of this study?
Answer
The population of a study refers to the group we want to determine information about. In this case, the study is being used to determine the popularity of different meals amongst the students of the school.
Hence, the population of this study is the students of the school.
In our next example, we will identify the data set of a given study.
Example 2: Identifying the Data Set of a Study
A government wants to determine the average age of the country’s population, so they question every citizen. What type of data set is used in this study?
Answer
The type of data set used in a study depends entirely on the proportion of the population measured. If a study measures every member of the population, then we call the data set a census; if only a fraction of the population is measured, then we call it a sample.
In this study, we are told that every person in the population is sampled, so the study is a census.
There are two more useful definitions to consider when surveying a population.
First, each member of the population is referred to as a sampling unit. For example, if we are surveying students in a school, then each student is a sampling unit. Similarly, if we are finding the average populations of all the countries on Earth, then the sampling units will be each country.
Second, we often label or number the sampling units to make them easier to discuss and identify. For example, when surveying students in a school, we might use their names or a student number. The list of all sampling units is known as the sampling frame.
This gives us the following definitions.
Definition: Sampling Units and Sampling Frame
Each individual element of the population is called a sampling unit.
We can label or number the sampling units to form a sampling frame.
It is important to note that the sampling frame and population are different but very similar. For example, consider surveying all students in a school. The population is a general concept of all of the students in the school. However, the sampling frame would be a specific list of every student in the school.
In our next example, we will identify the sampling units in a given study.
Example 3: Identifying the Sampling Units of a Study
A local motoring company wants to determine the average number of cars per household in a town. They survey every household in the town. What is the sampling unit in this survey?
Answer
In the given survey, the motoring company wants to know how many cars are owned by each household. This means that they will survey each household in any sample once. We refer to this as the sampling unit, since that is the unit being measured.
Hence, we can say that each household in the town is the sampling unit.
When we need to survey the data of a population, we first need to decide if we want to survey the whole population by taking a census or a sample. In order to do this, we can start by considering the advantages and disadvantages of each.
When we take a census, we measure every single member of the population. This means we have the maximum amount of data possible, so our study will be as accurate as possible. Since we have the maximum amount of data, we can also split the data into smaller demographics that would also include all of the members of these demographics. This would make any subsequent analysis easier and more accurate.
Similarly, if we were to take a sample and if we were to try and split the data into smaller demographics, we might not have enough data to obtain reliable information for smaller demographics.
However, when we take a census, we need to spend more time and usually money surveying every member of the population. For a particularly large data set, we will also have a lot more data to process, which can be a problem in many different ways. We can also note that we cannot take a census if the process of measuring destroys the item (e.g., testing the flammability of a batch of matches).
If we instead survey a sample of the population, then we can note that our results will be less accurate since there is a natural variation between members of the population and we need to survey a large portion of the population to accurately analyze any variation. It is worth noting, however, that a sample is quicker and costs less money to collect the data. Similarly, there will also be less data to process.
In our next example, we will identify which of a list of options is not an advantage of taking a sample over a census.
Example 4: Advantages of Samples
A company that manufactures bicycle helmets wants to determine the average force that will break their helmets. Which of the following is the reason a sample would be preferred over a census in this case?
- Each test will break a helmet.
- A sample will give an accurate result.
- Every helmet will break with the same force, so a census is not needed.
- A census will take too much time.
Answer
We start by recalling that a study is called a census if it measures every single member of the population, whereas a sample will only measure a fraction of the population.
This means there are a number of advantages and disadvantages to each type of data set since a sample requires less members of the population to be surveyed.
In this case, the company is testing the force required to break their helmets; this means that every test will break a helmet. If we were to use a census, we would break the entire stock of helmets, so instead, a sample would be preferred to preserve the stock of helmets.
Hence, the answer is A; we should take a sample since each test will break a helmet.
In our next example, we will determine whether taking a census or a sample is more appropriate for a given situation.
Example 5: Choosing between Census and Sample
Which of the following data sets would be suitable to evaluate the service of one of the hotels in the UK?
- Sample
- Census
Answer
We first recall that a census measures every single member of the population, whereas a sample will only measure a fraction of the population.
We want to evaluate the service of one of the hotels in the UK by questioning people who have stayed at the hotel. This means that the population will be anyone who has stayed at the hotel. It is not feasible to ask every single person who has ever stayed at the hotel for a number of reasons:
- The number of people who have stayed at the hotel is likely to be too large to feasibly question in its entirety.
- There are likely people from many different countries; this will make questioning everyone take longer and become more costly.
- The service may have changed drastically over time, so it is less useful to measure information from people who stayed at the hotel a long time ago.
It would be much easier to sample only people currently staying at the hotel. This way, the number of people sampled would be significantly reduced, everyone in the sample would be staying in the same building, and there would be no worries about previous experiences.
Hence, if we want to evaluate the service of a hotel in the UK, it is best to take a sample rather than a census.
In our next example, we will determine whether taking a census or sample is more appropriate for finding an average score on a test.
Example 6: Choosing between Census and Sample
Which of the following data sets would be suitable to determine the average score on a test in a class of 25 students?
- Sample
- Census
Answer
We first recall that a census measures every single member of the population, whereas a sample will only measure a fraction of the population.
In order to determine the average score on a test, we need to add all of the scores on the test and divide this number by the total number of students who took the test. This means we need to know every student’s score on the test to calculate the average. Therefore, we will need to measure every member of the population.
If we had instead taken a sample, then we could use the data to approximate the average score. This is useful when the population is large since it can reduce the time taken for the survey and we would not need every student to respond. The downside would be that our estimation of the average score would be less accurate.
Since we want an accurate result and usually the test results are easily accessible, a census will be the most suitable data set to determine the average score on a test.
In the previous example, we concluded that a census was the most suitable data set to determine the average score on a test.
In our final example, we will consider the validity of the conclusion of a study that used a sample.
Example 7: Validity of Conclusion from Samples
To determine the popularity of possible new flavors of ice cream, a company surveys 10 of its 100 000 customers. Which of the following will improve the accuracy of this study?
- Choosing healthier sounding flavors
- Asking only the 10 most loyal customers
- Asking only the 10 most recent customers
- Increasing the size of the sample
Answer
We know that the more data we measure, the more accurate our conclusion will be. This means that the more people the company surveys, the more accurate the results are. This is the same as increasing the size of the sample.
Hence, we can improve the accuracy by increasing the size of the sample.
Let’s finish by recapping some of the important points from this explainer.
Key Points
- The population of a study refers to the entire group we want to determine information about.
- If a study measures every member of the population, then we call the data set a census; if only a fraction of the population is measured, then we call it a sample.
- Each individual element of the population is called a sampling unit. We can label or number the sampling units to form a sampling frame.
- Taking a sample is often quicker and costs less than taking a census. There is less data to process and fewer members need to be measured.
- Taking a sample gives a less accurate measure of the data than taking a census.
- Taking a census leads to more accurate measures of the data; we can also split the data into smaller demographics that would also include all of the members of these demographics.
- Taking a census often takes longer and costs more money than taking a sample. There is also more data to process and more members need to be measured to find the full data set. We cannot take a census if the process of measuring destroys the item (e.g., testing the force needed to destroy a helmet).
- We can improve the accuracy of a survey by increasing the sample size.