Video Transcript
In this video on data collection,
we’ll understand what primary and secondary data sources are and the differences
between them. We’ll also learn what the
advantages are of using both of these sources.
Data collection is the beginning of
all statistical analysis. And we must be careful when we are
collecting data since the quality of our statistical analysis relies on the quality
of the data. There are so many different ways in
which we can collect data, whether that’s from interviews, questionnaires, websites,
or newspapers. And of course, there are benefits
and drawbacks to various sources of data. We can largely categorize these
different data sources into two types: primary and secondary data.
Primary data is new information
which is collected and organized directly by the researcher. Of course, if it’s a large-scale
data collection, then this could also be information collected by a team working for
the researcher. Some of the most common data
sources that are primary are interviews, questionnaires, focus groups, and
observations, which will all come from surveying or asking questions of a smaller
group of people. A census uses information collected
from an entire group of people. And in some countries, a census is
carried out every five or 10 years. Original documents can include
things like birth certificates, driving licenses, or even marriage certificates.
Then we have secondary data, which
is public or existing information which is collected and organized by others. Common types of secondary data
sources include research journals, websites, newspapers, textbooks, or reports. Later, in this video, we’ll look at
the advantages and disadvantages of primary and secondary data. But in the first example, we’ll
begin by identifying if a given source is primary or secondary data.
A teacher asked their students to
collect data on the effect of action video games on children from different websites
on the Internet. What is the type of data
collected?
Let’s remember that there are two
ways of categorizing data types, that is, primary and secondary data. Primary data is new information
which is collected and organized by a researcher. Secondary data is public or
existing information which is collected and organized by others. Because we’re told that the
students are collecting information from websites, that’s a good indicator that this
will be a secondary data source. If the students here are trying to
understand the effect of action video games on children, then if it was a primary
data source, they’d need to be asking fellow students or children questions
themselves. However, because they’re using
information which is collected and organized by other groups, then this will be
secondary data.
Let’s look at another example.
Which of the following statements
is not true?
And we’re given five statements
which we shall look at in turn. Statement (A) says, “Primary data
is data collected by the researcher themselves.” Well, this is essentially the
definition of primary data. It’s new data that’s collected by
the researcher themselves or by somebody collecting it on their behalf. Therefore, statement (A) is a true
statement. The statement in option (B) is “Web
pages are a source of secondary data.” We can remember that secondary data
is something which is collected and organized by others, and so websites or web
pages can be a good source of secondary data. This is a true statement.
Statement (C) says, “Secondary data
is data collected by the researcher themselves.” As we saw in statement (A), it’s
primary data when the data is collected by the researcher themselves. Therefore, this statement is
false. Let’s have a look at the remaining
two statements. Statement (D) says, “Questionnaires
are a source of primary data,” and statement (E) says, “Focus groups are a source of
primary data.” Both questionnaires and focus
groups are data collection methods which give new information which can be collected
and organized by the researcher themselves. Since both of these are primary
data sources, then both statement (D) and (E) are true. Therefore, the statement that is
not true is statement (C): secondary data is data collected by the researcher
themselves.
We’ll now look at two more examples
where we identify primary and secondary data sources.
Which of the following is not a
source of secondary data? Option (A) research journals and
newspapers, option (B) Internet, option (C) teaching and research organizations,
option (D) questionnaires, or option (E) government organizations.
Let’s begin by recalling that
secondary data is public or existing information that’s collected and organized by
others. The other type of data is primary
data, which is new information collected and organized by the researcher. One way of thinking through this
problem is by considering different scenarios of data collection. Here we have a researcher who’s
collecting information from a group of people that they want to study. This could be through interviews,
questionnaires, or even focus groups. At this point, what this researcher
has is primary data.
When they analyze their data, they
may produce some sort of report or journal, for example, including it in a newspaper
or a website. Another researcher, who may well be
researching something else, who’s looking at these different reports would be
looking at secondary data. So, when we consider the data types
in statements (A), (B), (C), and (E), these are all types of secondary data. In other words, we’re getting our
information from this type of information, the reports and papers and things, and
not from the people who were originally asked.
If we look at statement (D),
however, questionnaires are used by researchers to collect the data directly. Since the use of questionnaires
leads to new information collected and organized by the researcher, then they are a
source of primary data. Therefore, the sources in (A), (B),
(C), and (E) are all sources of secondary data. So the one which is not a source of
secondary data is (D), questionnaires. This is a source of primary
data.
Which of the following is not a
source of primary data? Option (A) focus groups, option (B)
personal investigation, option (C) telephone calls, option (D) questionnaires, or
option (E) research journals and newspapers.
We can recall that we can
categorize data into two different types. Primary data is new information
which is collected and organized directly by the researcher. Secondary data is public or
existing information which is collected and organized by others. Focus groups, personal
investigation, telephone calls, and questionnaires are tools that researchers can
use to collect data directly. Using any of these tools will lead
to new information collected and organized by the researcher themselves. So they are all sources of primary
data.
However, if we look at statement
(E), research journals and newspapers contain data which has been collected and
organized by others. This means that these will both be
sources of secondary data. Research journals and newspapers
are therefore not a source of primary data.
We’ll now consider some of the
advantages and disadvantages of primary and secondary data sources. Let’s take the following example
scenario. Let’s say that we’re collecting
data on a per capita income trend in a city. Some of the different ways in which
we could collect this data would be focus groups, interviews, questionnaires,
government documents, newspapers, or research journals. Let’s consider some of the
advantages and disadvantages of the secondary data first. Well, one of the good things about
some secondary data types, for example, government documents, is that there is lots
of data available and often that data is things that we cannot collect
ourselves.
For example, if somebody is giving
information to a researcher in the street, they might not be very happy to give
information about their earnings, but they would give that information in a
census. Often the data in secondary sources
is well organized. So what are some disadvantages of
secondary data? Well, firstly, it might not be
organized in the way that we need. Also, we can’t control from whom
the data is collected. For example, in our scenario of
collecting data for per capita income, the data in secondary data could include
teenagers with part time jobs, and we may not want that for our study.
Of course, when we consider primary
data, some of their advantages and disadvantages will simply be the opposite of
those that we have for secondary data. The major benefit of primary data
is it gives us exactly the data that is required. If there were absolutely no
disadvantages to primary data, it would almost always be the best data source. So why might we not choose primary
data sources? Well, the two major disadvantages
of primary data is the cost and the fact that it’s time-consuming. We need to pay interviewers. And even for something like
questionnaires, we would need to pay for the cost of printing and mailing those
questionnaires.
When we’re deciding how to collect
data for a study, we’ll need to weigh up the options of both types of data and
understand the benefits and drawbacks of each. It may even be the case that we use
a range of different data collection options, including some from primary data and
some from secondary data sources. We’ll now look at one final
question on the disadvantages of secondary data.
List two disadvantages of secondary
data.
We can begin by recalling that
secondary data is public or existing information which is collected and organized by
others. Some example types of secondary
data sources are newspapers, websites, journals, and government documents. One disadvantage of secondary data
is that we cannot control who the data is collected from. For example, the data might be
collected from a sample with different characteristics for the population that we
are studying. This could even be something like
the secondary data contains data from different age groups that we are not
interested in. We also commonly find that
secondary data is often outdated. The information in the secondary
data source may have been collected several years previously.
We could summarize both of these
disadvantages by saying that the data may be irrelevant to our study. We can also list the disadvantage
that we can’t control how secondary data is collected and organized. The data may not be an accurate
representation of the population we’re interested in. In general, however, the two main
disadvantages of secondary data is that they can be irrelevant and unreliable, and
so we can give the answer as irrelevant and unreliable.
We can now summarize the key points
of this video. We saw that primary data is new
information which is collected and organized directly by the researcher. Secondary data is public or
existing information which is collected and organized by others. We also saw some key examples of
primary and secondary data types. And finally, we considered a range
of benefits and drawbacks to both primary and secondary data. These are summarized in the given
table.