Lesson Video: Data Collection | Nagwa Lesson Video: Data Collection | Nagwa

Lesson Video: Data Collection Mathematics • Third Year of Preparatory School

In this video, we will learn how to recognize the differences between, and the advantages of using, primary and secondary data sources.

12:41

Video Transcript

In this video on data collection, we’ll understand what primary and secondary data sources are and the differences between them. We’ll also learn what the advantages are of using both of these sources.

Data collection is the beginning of all statistical analysis. And we must be careful when we are collecting data since the quality of our statistical analysis relies on the quality of the data. There are so many different ways in which we can collect data, whether that’s from interviews, questionnaires, websites, or newspapers. And of course, there are benefits and drawbacks to various sources of data. We can largely categorize these different data sources into two types: primary and secondary data.

Primary data is new information which is collected and organized directly by the researcher. Of course, if it’s a large-scale data collection, then this could also be information collected by a team working for the researcher. Some of the most common data sources that are primary are interviews, questionnaires, focus groups, and observations, which will all come from surveying or asking questions of a smaller group of people. A census uses information collected from an entire group of people. And in some countries, a census is carried out every five or 10 years. Original documents can include things like birth certificates, driving licenses, or even marriage certificates.

Then we have secondary data, which is public or existing information which is collected and organized by others. Common types of secondary data sources include research journals, websites, newspapers, textbooks, or reports. Later, in this video, we’ll look at the advantages and disadvantages of primary and secondary data. But in the first example, we’ll begin by identifying if a given source is primary or secondary data.

A teacher asked their students to collect data on the effect of action video games on children from different websites on the Internet. What is the type of data collected?

Let’s remember that there are two ways of categorizing data types, that is, primary and secondary data. Primary data is new information which is collected and organized by a researcher. Secondary data is public or existing information which is collected and organized by others. Because we’re told that the students are collecting information from websites, that’s a good indicator that this will be a secondary data source. If the students here are trying to understand the effect of action video games on children, then if it was a primary data source, they’d need to be asking fellow students or children questions themselves. However, because they’re using information which is collected and organized by other groups, then this will be secondary data.

Let’s look at another example.

Which of the following statements is not true?

And we’re given five statements which we shall look at in turn. Statement (A) says, “Primary data is data collected by the researcher themselves.” Well, this is essentially the definition of primary data. It’s new data that’s collected by the researcher themselves or by somebody collecting it on their behalf. Therefore, statement (A) is a true statement. The statement in option (B) is “Web pages are a source of secondary data.” We can remember that secondary data is something which is collected and organized by others, and so websites or web pages can be a good source of secondary data. This is a true statement.

Statement (C) says, “Secondary data is data collected by the researcher themselves.” As we saw in statement (A), it’s primary data when the data is collected by the researcher themselves. Therefore, this statement is false. Let’s have a look at the remaining two statements. Statement (D) says, “Questionnaires are a source of primary data,” and statement (E) says, “Focus groups are a source of primary data.” Both questionnaires and focus groups are data collection methods which give new information which can be collected and organized by the researcher themselves. Since both of these are primary data sources, then both statement (D) and (E) are true. Therefore, the statement that is not true is statement (C): secondary data is data collected by the researcher themselves.

We’ll now look at two more examples where we identify primary and secondary data sources.

Which of the following is not a source of secondary data? Option (A) research journals and newspapers, option (B) Internet, option (C) teaching and research organizations, option (D) questionnaires, or option (E) government organizations.

Let’s begin by recalling that secondary data is public or existing information that’s collected and organized by others. The other type of data is primary data, which is new information collected and organized by the researcher. One way of thinking through this problem is by considering different scenarios of data collection. Here we have a researcher who’s collecting information from a group of people that they want to study. This could be through interviews, questionnaires, or even focus groups. At this point, what this researcher has is primary data.

When they analyze their data, they may produce some sort of report or journal, for example, including it in a newspaper or a website. Another researcher, who may well be researching something else, who’s looking at these different reports would be looking at secondary data. So, when we consider the data types in statements (A), (B), (C), and (E), these are all types of secondary data. In other words, we’re getting our information from this type of information, the reports and papers and things, and not from the people who were originally asked.

If we look at statement (D), however, questionnaires are used by researchers to collect the data directly. Since the use of questionnaires leads to new information collected and organized by the researcher, then they are a source of primary data. Therefore, the sources in (A), (B), (C), and (E) are all sources of secondary data. So the one which is not a source of secondary data is (D), questionnaires. This is a source of primary data.

Which of the following is not a source of primary data? Option (A) focus groups, option (B) personal investigation, option (C) telephone calls, option (D) questionnaires, or option (E) research journals and newspapers.

We can recall that we can categorize data into two different types. Primary data is new information which is collected and organized directly by the researcher. Secondary data is public or existing information which is collected and organized by others. Focus groups, personal investigation, telephone calls, and questionnaires are tools that researchers can use to collect data directly. Using any of these tools will lead to new information collected and organized by the researcher themselves. So they are all sources of primary data.

However, if we look at statement (E), research journals and newspapers contain data which has been collected and organized by others. This means that these will both be sources of secondary data. Research journals and newspapers are therefore not a source of primary data.

We’ll now consider some of the advantages and disadvantages of primary and secondary data sources. Let’s take the following example scenario. Let’s say that we’re collecting data on a per capita income trend in a city. Some of the different ways in which we could collect this data would be focus groups, interviews, questionnaires, government documents, newspapers, or research journals. Let’s consider some of the advantages and disadvantages of the secondary data first. Well, one of the good things about some secondary data types, for example, government documents, is that there is lots of data available and often that data is things that we cannot collect ourselves.

For example, if somebody is giving information to a researcher in the street, they might not be very happy to give information about their earnings, but they would give that information in a census. Often the data in secondary sources is well organized. So what are some disadvantages of secondary data? Well, firstly, it might not be organized in the way that we need. Also, we can’t control from whom the data is collected. For example, in our scenario of collecting data for per capita income, the data in secondary data could include teenagers with part time jobs, and we may not want that for our study.

Of course, when we consider primary data, some of their advantages and disadvantages will simply be the opposite of those that we have for secondary data. The major benefit of primary data is it gives us exactly the data that is required. If there were absolutely no disadvantages to primary data, it would almost always be the best data source. So why might we not choose primary data sources? Well, the two major disadvantages of primary data is the cost and the fact that it’s time-consuming. We need to pay interviewers. And even for something like questionnaires, we would need to pay for the cost of printing and mailing those questionnaires.

When we’re deciding how to collect data for a study, we’ll need to weigh up the options of both types of data and understand the benefits and drawbacks of each. It may even be the case that we use a range of different data collection options, including some from primary data and some from secondary data sources. We’ll now look at one final question on the disadvantages of secondary data.

List two disadvantages of secondary data.

We can begin by recalling that secondary data is public or existing information which is collected and organized by others. Some example types of secondary data sources are newspapers, websites, journals, and government documents. One disadvantage of secondary data is that we cannot control who the data is collected from. For example, the data might be collected from a sample with different characteristics for the population that we are studying. This could even be something like the secondary data contains data from different age groups that we are not interested in. We also commonly find that secondary data is often outdated. The information in the secondary data source may have been collected several years previously.

We could summarize both of these disadvantages by saying that the data may be irrelevant to our study. We can also list the disadvantage that we can’t control how secondary data is collected and organized. The data may not be an accurate representation of the population we’re interested in. In general, however, the two main disadvantages of secondary data is that they can be irrelevant and unreliable, and so we can give the answer as irrelevant and unreliable.

We can now summarize the key points of this video. We saw that primary data is new information which is collected and organized directly by the researcher. Secondary data is public or existing information which is collected and organized by others. We also saw some key examples of primary and secondary data types. And finally, we considered a range of benefits and drawbacks to both primary and secondary data. These are summarized in the given table.

Join Nagwa Classes

Attend live sessions on Nagwa Classes to boost your learning with guidance and advice from an expert teacher!

  • Interactive Sessions
  • Chat & Messaging
  • Realistic Exam Questions

Nagwa uses cookies to ensure you get the best experience on our website. Learn more about our Privacy Policy