In this explainer, we will learn how to recognize the differences between, and the advantages of using, primary and secondary data sources.
Data collection is a beginning stage of all statistical work, and we must take great care when collecting data since the quality of our statistical analysis relies on this stage. Often, we can find different sources for data collection, and it is important to be aware of benefits and drawbacks of various sources of data. For example, let’s say that we are collecting data to understand a trend in per capita income in a city. We could survey different individuals by questionnaires or interviews, we could request access to government documents containing this data, or we could refer to published articles in economic newspapers and research journals.
As we can see, there are many different sources of data. Largely, we can categorize these into two different types: primary data and secondary data.
Definition: Primary and Secondary Data
Primary data is new information that is collected and organized directly by the researcher.
Secondary data is public or existing information that is collected and organized by others.
From our example, questionnaires and interviews would give us primary data since new information would be collected and organized by us. On the other hand, government documents, economic newspapers, and journals would give us secondary data, since these sources present existing information collected and organized by others.
In our first example, we will determine whether a given source provides primary or secondary data.
Example 1: Determining Whether Data Is Primary or Secondary
A teacher asked their students to collect data on the effect of action video games on children from different websites on the Internet. What is the type of data collected?
Answer
Recall that primary data is new information that is collected and organized directly by the researcher, while secondary data is public or existing information that is collected and organized by others. In this example, the students are collecting data on the effect of action video games on children from different websites. Websites on the Internet present data that is collected and organized by others.
Hence, secondary data is collected by the students in this example.
Let us consider another example dealing with primary and secondary sources of data.
Example 2: Understanding the Difference between Primary and Secondary Data
Which of the following statements is not true?
- Primary data is data collected by the researcher themselves.
- Web pages are a source of secondary data.
- Secondary data is data collected by the researcher themselves.
- Questionnaires are a source of primary data.
- Focus groups are a source of primary data.
Answer
Recall that primary data is new information that is collected and organized directly by the researcher, while secondary data is public or existing information that is collected and organized by others. We can see that option A is correct by definition. Let us consider the remaining options.
Option B: Web pages are a source of secondary data.
Web pages on the internet provide data that is collected and organized by others. Hence, web pages are a source of secondary data. This is a true statement.
Option C: Secondary data is data collected by the researcher themselves.
Secondary data is collected and analyzed by others rather than directly by the researcher. This is a false statement.
Options D and E: Questionnaires/focus groups are a source of primary data.
Questionnaires and focus groups are often used by researchers to collect data. Since the use of questionnaires or focus groups
leads to new information collected and organized by the researcher, they are sources of primary data. These are both true statements.
Hence, the only statement that is not true is option C.
In previous examples, we determined whether data is primary or secondary by considering its source. So far, we have seen interviews, questionnaires, and focus groups as sources of primary data and government documents, newspapers, research journals, and websites as sources of secondary data. Let us provide more examples of primary and secondary sources of data.
Primary sources of data are tools used to collect new data from a sample. For example, the following are frequently used:
- Interviews
- Questionnaires
- Focus groups
- Census
- Observations
- Original documents (logs, birth certificates, individual tax forms, diaries, etc.)
Secondary sources of data are collected and organized by others. Often, secondary data is presented with analysis and interpretation of the data. For example, the following are sources of secondary data:
- Newspapers
- Journals
- Websites
- Televisions
- Textbooks
- Dictionaries
- Reports
- Government organizations
- Teaching or research organizations
Let us consider an example where we distinguish primary and secondary sources of data.
Example 3: Recognizing a Source of Secondary Data
Which of the following is not a source of secondary data?
- Research journals and newspapers
- Internet
- Teaching and research organizations
- Questionnaires
- Government organizations
Answer
Recall that primary data is new information that is collected and organized directly by the researcher, while secondary data is public or existing information that is collected and organized by others. Research journals and newspapers, internet, teaching and research organizations, and government organizations all have their own researchers who collect and organize their data. Data that has already been collected and organized is secondary data. Hence, options A, B, C, and E are sources of secondary data.
On the other hand, questionnaires are often used by researchers to collect data directly. Since the use of questionnaires leads to new information collected and organized by the researcher, they are a source of primary data.
Of the given options, option D is not a source of secondary data.
Let us consider another example where we distinguish primary and secondary sources of data.
Example 4: Recognizing a Source of Primary Data
Which of the following is not a source of primary data?
- Focus groups
- Personal investigation
- Telephone calls
- Questionnaires
- Research journals and newspapers
Answer
Recall that primary data is new information that is collected and organized directly by the researcher, while secondary data is public or existing information that is collected and organized by others. Focus groups, personal investigation, telephone calls, and questionnaires are tools that researchers can use to collect data directly. The use of any of these tools can lead to new information collected and organized by the researcher, so they are a source of primary data.
On the other hand, research journals and newspapers contain data that has been collected and organized by others. Hence, these are sources of secondary data, not primary data.
Of the given options, option E is not a source of primary data.
In previous examples, we have considered examples where we distinguished between primary and secondary data. Let us consider some of the benefits of these categories of data. To this end, we return to our example introduced at the beginning of this explainer, collecting data for per capita income trend in a city. Remember that we had options to collect data from any of the following: questionnaires, interviews, government documents, newspapers, and research journals.
Let us first consider the benefits of each source of data. Surveying individuals by questionnaires or interviews could lead to new and original data that is otherwise unavailable, and we can customize what type of data we collect using this method. Government documents provide a vast quantity of data that would be unattainable from individual interviews, and economic newspapers and research journals generally provide well-organized data that is easily accessible.
What are some of the drawbacks of these sources? Surveying individuals by questionnaires or interviews would be quite expensive. Costs for conducting questionnaires or interviews can include time for interviewers, cost for printing and mailing questionnaires, and providing incentives for the respondents. Government documents, economic newspapers, and research journals are often organized in a specific way that may be undesirable for our study. Furthermore, since we cannot control when or from whom data is collected, the data from these sources may not be relevant to our study.
Even among the same category of primary or secondary data, there are different benefits and drawbacks for each source. When deciding how to collect data for a study, it is important to weigh the options and understand both the benefits and the drawbacks of each option. But in this explainer, we want to clearly describe the benefits and drawbacks of the categories of data, which are primary and secondary data. What are the main benefits and drawbacks of primary data?
The apparent benefit of primary data is that it gives authentic, reliable, and up-to-date information. If there were no drawbacks, it would always be preferable to use primary data instead of secondary data. In the next example, we consider the drawbacks of primary data.
Example 5: Disadvantages of Primary Data
Which of the following is a disadvantage of primary data?
- It is less accurate.
- It is more accurate.
- It requires more time and money.
- It saves time and money.
Answer
Recall that primary data is new information that is collected and organized directly by the researcher. Primary data is preferred to secondary data because it gives authentic, reliable, and up-to-date information. Researchers can obtain primary data by conducting interviews, questionnaires, focus groups, or observations. An advantage of primary data is that it is more accurate.
However, we can see that all of these methods are very time consuming, especially if we need to collect data from a large sample. Also, if we are collecting data using questionnaires by mail, printing and mailing costs can add up to be significant.
Hence, the main disadvantage of primary data is that it requires more time and money. This is option C.
We have discussed the benefits and drawbacks of primary data. Let us now consider the benefits and drawbacks of secondary data. Secondary data is already collected and organized by others, so the apparent benefit is that it is often cheap and easily obtained. In the next example, we consider the drawbacks of secondary data.
Example 6: Disadvantages of Secondary Data
Which of the following is a disadvantage of secondary data?
- It is less accurate.
- It is more accurate.
- It requires more time and money.
- It saves time and money.
Answer
Recall that secondary data is public or existing information that is collected and organized by others. Sources of secondary data include newspapers, research journals, and government organizations. Compared to primary data, which involves collecting data directly from polls and interviews, secondary data does not require as much time and money. Hence, it is an advantage of secondary data that it saves time and money.
One of the main disadvantages of secondary data is the fact that we do not have control over from whom the data is collected. We also do not have control over what data is collected. If data is collected from a sample that contains different characteristics compared to our population of study, then the collected data will be irrelevant. Furthermore, if the data that has been collected is outdated, then it would also be irrelevant to our study.
Another disadvantage of secondary data is that it may be unreliable. Since we have no control over how secondary data is collected and organized, we cannot rule out the fact that the presented data may not be an accurate representation of the population. Also, even if the researchers have provided additional details to confirm its accuracy, the usage of their data for our purpose can be limited.
Hence, the main disadvantage of secondary data is that it is less accurate. This is option A.
Let us finish by recapping a few important concepts from this explainer.
Key Points
- Primary data is new information that is collected and organized directly by the researcher, while secondary data is public or existing information that is collected and organized by others.
- Examples of sources for primary and secondary data are summarized in the table below.
Primary Data Secondary Data Interviews
Questionnaires
Focus groups
Census
Observations
Original documentsResearch journals
Websites
Newspaper
Textbooks
Reports
Various organizations - Benefits and drawbacks of primary and secondary data are summarized in the table below.
Primary Data Secondary Data Benefits Authentic, reliable, and up-to-date Cheap and easily obtained Drawback Costly and time consuming May be irrelevant and unreliable