In this explainer, we will learn how to determine when to take a sample and when to use the whole population.

We begin by recalling that primary data is new information collected directly by the researcher. We know that there are several different sources of primary data, including questionnaires, interviews, and census. One of the benefits of primary data is that we can control who we collect our data from. For example, consider a scenario where we are collecting data to study per capita income trends in a city using mailed questionnaires. We can mail the questionnaires to the every person in the city, or we can first randomly select a few people and mail our questionnaires to them.

Let us define key terminologies first.

### Definition: Population, Individuals, and Samples

A **population** is the complete group of people or objects that are the
target of a statistical study.

An **individual** is a single person or object in the population.

A **sample** is a group of individuals smaller than the population.

The method of collecting primary data can be split largely into two methods: the method of mass population and the method of samples. The method of mass population means that we collect data from the entire population, while the method of samples means that we collect data from randomly selected samples. From our previous example of collecting income trends data, using the method of mass population means that we will mail our questionnaires to every person in the city, while using the method of samples means that we will mail our questionnaires only to a few randomly selected people. The method of mass population will provide more accurate data than the method of samples, but it is also much more expensive. Thus, we need to consider these benefits and drawbacks when we choose from whom data is collected.

Let us begin with an example where it is feasible to collect data directly from the population.

### Example 1: Distinguishing between Population and Sample

Which of the following data sets would be suitable to determine the average score on a test in a class with 25 students?

- Samples
- Mass population

### Answer

Recall that a population is the complete group of people or objects that are the target of a statistical study, and a sample is a group of individuals smaller than the population. Clearly, collecting data from the population gives more accurate data compared to samples. If it is feasible and not too costly to collect data from the population, we should collect data from the population rather than a sample.

In this example, the population is the group of 25 students, since the target of our statistical study is to obtain the class average on a test. It is not difficult or costly to collect the scores from the 25 students, so it would be best to collect data from all of them.

Option B, mass population, is a suitable data set to determine the average score on a test.

In the previous example, we saw a situation where it was not too difficult to collect data from the population. Often in these circumstances, the preferred method is to collect data from the population. But this method becomes quite expensive and time consuming if the population is large, for example, the population of a country. Even in these cases, the national government has enough resources to conduct national surveys, called census, periodically. This is done because it is very important for policy-making decisions to obtain accurate data. For private researchers and organizations, such a task is too costly and unpractical. In such cases, we generally resort to the method of samples.

Let us consider an example where the method of population would be too costly and time consuming.

### Example 2: Distinguishing between Population and Sample

Which of the following data sets would be suitable to check the education level in the poor villages in Africa?

- Samples
- Mass population

### Answer

Recall that a population is the complete group of people or objects that are the target of a statistical study, and a sample is a group of individuals smaller than the population. Collecting data from the population is better for accuracy, but it is more expensive compared to collecting data from a sample.

In this example, the population is the group of people in the poor villages in Africa. Collecting data from each individual in the population would be a daunting task, since the size of this population is very large. In this case, it would be preferable to collect data from samples instead.

Hence, option A, samples, is a suitable data set to check the education level in the poor villages in Africa.

When we collect data from the entire population, we can obtain the precise information for a variable of study. We can then obtain numerical summaries, such as the mean, from the data for the purpose of our study. This numerical summary would be the true value of the variable for the population. If we collect data from a sample, then a numerical summary from the sample data set is not the true value or characteristic for the entire population. Depending on how the sample was selected, this could be a good approximation of the true value for the population. Let us define a few important terminologies here.

### Definition: Population Characteristic and Sample Statistic

A **population characteristic** is a numerical summary of the entire
population.

A **sample statistic** is a numerical summary of a sample.

Let us consider an example where we will determine whether a given value is a population characteristic or a sample statistic.

### Example 3: Collecting Data from a Sample versus a Whole Population

Sarah knows all the families living in her area quite well. She says she has found out that the average number of children per family is 2.3. Is this figure a sample statistic or a population characteristic?

### Answer

Recall that a population characteristic is a numerical summary of the entire population, and a sample statistic is a numerical summary of a sample.

In this example, the population is the group of families living in Sarah’s area. Since Sarah knows quite well all the families in her area, she likely knows the number of children in each family. Hence, we can assume that the average number of children was computed using data from every family in her area. In other words, the value 2.3 is a numerical summary of the entire population.

Hence, the figure 2.3 is a population characteristic.

Let us consider another example where we distinguish between population characteristic and sample statistic.

### Example 4: Collecting Data from a Sample versus a Whole Population

A study claims that of people aged 16 to 24 in a certain country own a smart phone. Is this a sample statistic or a population characteristic?

### Answer

Recall that a population characteristic is a numerical summary of the entire population, and a sample statistic is a numerical summary of a sample.

In this example, the population is the group of people aged 16 to 24 in a certain country. We need to understand whether the number came from samples or the entire population. If this number came from the entire population, this means that the researchers surveyed every individual between ages 16 and 24 in the country, which would be a massive undertaking. This is an unlikely scenario. Moreover, if this were the case, would be the true value for the entire population. But we see that the given statement begins with “a study claims,” meaning that this is a conjecture. Based on both the large size of the population and the choice of language used (claims), we can assume that this value was obtained using samples.

Hence, value is a sample statistic.

In previous examples, we determined whether a given number was a population characteristic or a sample statistic. We observed that when the population of a statistical study is large, a sample statistic is first obtained in most cases. A notable exception to this rule is the government census since the national government has enough resources to conduct a comprehensive survey of the whole population.

A sample statistic on its own does not hold any merit, since it is merely a description of a small group of individuals. But if the sample is randomly selected, then a numerical summary of the sample may be a good estimate of the corresponding characteristic of the entire population.

### Definition: Statistical Inference

An **inference** in statistics is the method or the process of estimating a
population characteristic by using sample statistics.

While an estimate sounds straightforward, the statistical conclusion must include information about a margin of error and level of confidence for each estimate. Let us return to our example of collecting data on a per capita income trend in a city. Say that we have collected income figures from 100 randomly selected individuals in the city, and the mean income from this data set is $50 000. In this case, an inference would be the method or process of concluding that, based on our random sample, the mean income of all individuals in the city should be approximately $50 000 with a given margin of error and level of confidence.

Inference requires knowledge of various statistical methodologies and theories, which is not covered in this explainer. The aim of this explainer is for us to understand that a sample statistic can be used to estimate a population characteristic by a process referred to as inference.

Let us finish by considering an example that discusses a statistical inference.

### Example 5: Identifying Representative Samples That Support Generalization

Which of these describes an inference in statistics?

- Working out the percentage of a population that exhibits a certain characteristic
- Applying conclusions drawn from a sample of a whole population
- Computing a statistic from the sample
- Generating a random sample from a given population

### Answer

Recall that an inference in statistics is the method or the process of estimating a population characteristic by using sample statistics. We also remember that a population characteristic is a numerical summary of the entire population, and a sample statistic is a numerical summary of a sample. In other words, an inference is the method of applying conclusions drawn from a sample of a whole population to estimate a population characteristic. Hence, option B describes an inference in statistics.

Let us examine the remaining options. Options C and D describe statistical works for obtaining a sample statistic. These works take place before the inference; hence, we can rule out these options. Option A describes statistical work that deals directly with the population. This also does not fit the definition of an inference, which must involve a sample.

Option B describes an inference in statistics.

Let us finish by recapping a few important concepts from this explainer.

### Key Points

- A population is the complete group of people or objects that are the target of a statistical study, and a sample is a group of individuals smaller than the population. The method of mass population means that we collect data from the entire population, while the method of samples means that we collect data from randomly selected samples.
- While the method of mass population gives the most accurate information, it is often too costly and time consuming when the population is large. In such cases, the method of samples is preferred to save time and money.
- A population characteristic is a numerical summary of the entire population, and a sample statistic is a numerical summary of a sample.
- An inference in statistics is the method or the process of estimating a population characteristic by using sample statistics.
- In most statistical studies where the size of the population is large, researchers collect data from a random sample, from which they obtain various sample statistics. Using sample statistics, the researchers apply a statistical inference to estimate a population characteristic.