In this explainer, we will learn how to distinguish between categorical and quantitative variables.
In statistics, we are often interested in analyzing properties of a data set to determine relationships between variables or to make predictions. Here, a data set refers to a collection of information, and a variable refers to a characteristic or feature of an individual or object. The observed values of the variables are called data.
For instance, a list of people’s heights would be a data set, where the height of a person is a variable and the recorded values are data.
Definition: Variables and Data
- A variable refers to a characteristic or feature of an individual or object.
- We can measure these characteristics of an individual or object to form data.
- A data set is a collection of data.
We can also consider the types of variables from data. In the above example, the height of a person is recorded as a number, so height is considered to be a quantitative, or numerical, variable.
We can also consider variables that are not quantitative. For instance, a person’s favorite food or the color of a car would be recorded in words rather than in numbers. We call these qualitative, or categorical, variables, since they describe different qualities or categories of individuals.
Definition: Qualitative and Quantitative Variables
A variable that can be recorded as a number is called a quantitative, or numerical, variable.
A variable that is not recorded as a number is called a qualitative, or categorical, variable.
We refer to data that is measuring a quantitative variable as quantitative data, and we refer to data that is measuring a qualitative variable as qualitative data.
In our first example, we will identify which of four given options is not a type of qualitative variable.
Example 1: Identifying a Nonquantitative Variable
Which of the following is not an example of quantitative data?
- Temperature (in degrees)
- Salary
- Hobbies
- Time spent at work
Answer
We recall that quantitative variables are ones that are measured numerically. This means that we can determine which of the options is not an example of a quantitative variable by considering whether or not their data would be recorded as numbers or as words.
First, the temperature of an object (in degrees) will be a number, so it is a quantitative variable.
Second, the salary of a person is usually measured in pound sterling per annum and will also be numeric. This is also a quantitative variable.
Third, the hobbies a person might have would be given words rather than numbers, since this is descriptive. In particular, it is nonnumeric. Thus, this is not a quantitative variable. In fact, it is a qualitative variable.
Finally, the time a person spends at work would usually be recorded in hours, so this is a quantitative variable.
Hence, of the given options, we can say that only answer C, the hobbies of a person, is not an example of quantitative data.
We can split quantitative variables into two types by considering the type of data values we can expect from the variables. For example, if we measure the number of cars passing by a house in an hour, then we know that we will always be measuring a whole number of cars. Such quantitative variables are called discrete since there are certain numbers in the range that we cannot have (e.g., there cannot be 1.5 cars passing in an hour).
On the other hand, consider the speed in kilometres per hour of cars passing by a house; then, we can note that this can take any nonnegative value within a reasonable range. Such quantitative variables are called continuous because possible values of the variable form a continuous interval.
When recording data for a continuous variable, the values are rounded to a certain decimal point or sometimes to the nearest integer. However, the variable does not become discrete just because it has been rounded to the nearest integer.
To distinguish between discrete and continuous variables, we need to consider whether or not it is possible for the variable to take any values in between two different numbers.
Definition: Discrete and Continuous Variables
There are two types of quantitative variables:
- Discrete variables can only take certain values in a given range.
- Continuous variables are measured and can take any value within a given range.
In our next example, we will identify which of five given options is not a type of continuous variable.
Example 2: Identifying Continuous and Discrete Variables
Which of the following is not an example of a continuous variable?
- The top speed of a car in kilometres per hour
- The cost of a person’s cart at checkout in pound sterling
- The force required to move a box in newtons
- The energy in a breakfast measured in kilocalories
Answer
We recall that continuous variables are quantitative variables that can take any value within an interval. In particular, the values of a continuous variable are not restricted to whole numbers. We can determine which of the options is not an example of a continuous variable by considering whether or not it is possible for the variable to take any value within an interval.
First, the top speed of a car in kilometres per hour can take any nonnegative numerical value within a reasonable range of values. Thus, it is an example of a continuous variable.
Second, the cost of a person’s cart at checkout in pound sterling will include pence, which means that its values can contain up to two decimal places. However, this does not mean that the cost can take any numerical value within an interval. For example, it is not possible for the cart to cost £1.001 since no item in the store would have a price containing a thousandth of a pound, which is one-tenth of a pence. Because it is not possible for this variable to take any possible value within an interval, the cost of a person’s cart is a discrete variable.
For due diligence, we will check the other options.
Third, the force required to move a box in newtons can take any positive value, so this is an example of a continuous variable.
Finally, the energy in a breakfast measured in kilocalories can once again take any positive value, so this is an example of a continuous variable.
Hence, option B, the cost of a person’s cart at checkout in pound sterling, is not an example of a continuous variable.
Typically, a frequency table or a grouped frequency table is used when summarizing data from a data set. For instance, we can summarize a data set containing 20 people’s favorite foods by recording the frequency with which each value of a variable appears in the data set.
Food | Frequency |
---|---|
Pizza | 10 |
Hamburger | 6 |
Pasta | 4 |
As we can see, a frequency table can be used to summarize a large data set. It can be used for quantitative data as well. However, for a quantitative variable, it would be impractical to include every value of variables in a frequency table. This is particularly the case when the variable is continuous, such as a speed of a car, since the number of possible values is infinite.
In such cases, we can group a range of values into intervals and record the frequency with which data falls within certain intervals (called classes).
Speed, (km/h) | Frequency |
---|---|
5 | |
7 | |
8 |
Such a table is referred to as a grouped frequency table. Let’s discuss the different components of a grouped frequency table using this table.
This particular grouped frequency table has three different classes. Within each class, we call the greater of the two bounds in the class the upper class boundary and the lower of the two bounds in the class the lower class boundary. It is important to keep in mind that these may not be the actual data values, but they are just the boundaries for the range of values for the class.
For example, the first class in this table is . The lower and upper class boundaries of this class are 27 km/h and 29 km/h respectively. We should note that the lower class boundaries are included in their classes, while the upper class boundaries are not included. Hence, a data value of 27 km/h would be counted toward the first class, but a data value of 29 km/h would be counted toward the second class.
The class width is the difference between the upper and lower class boundaries, and the class midpoint is the average of the class boundaries.
Definition: Class Boundaries, Widths, and Midpoints
In a grouped frequency table, we group a range of values into intervals and record the frequency with which data falls within certain intervals (called classes).
Within each class, we call the greater of the two bounds in the class the upper class boundary and the lower of the two bounds in the class the lower class boundary.
The class width is the difference between the upper and lower class boundaries, and the class midpoint is the average of the class boundaries.
In our next example, we will determine the class width of a group and the midpoint of another group in a given grouped frequency table.
Example 3: Identifying Class Parameters of a Grouped Frequency Table
A class grows 20 plants for a month and then measures the length of each plant in centimetres. The results are given in the table.
Length, (cm) | Frequency |
---|---|
2 | |
3 | |
6 | |
7 | |
2 |
- What are the boundaries of the third class?
- What is the width of the third class?
- What is the midpoint of the third class?
Answer
Part 1
We first recall that the class boundaries are the largest and smallest values for the class. The third class contains 6 members, all of which have lengths in the following interval:
We see that all of the lengths in this class must be less than 12.5 cm, so we call this the upper class boundary. Similarly, we see that all of the lengths in this class must be greater than or equal to 12 cm, so we call this the lower class boundary.
Hence, the class boundaries are 12 cm and 12.5 cm.
Part 2
We recall that the class width is the difference between the upper and lower class boundaries. The upper class boundary is 12.5 cm and the lower class boundary is 12 cm. So, the class width is
Part 3
We recall that the class midpoint is the average of the class boundaries. Therefore, we can find the midpoint of the third class by adding the upper and lower class boundaries found in the first part together and dividing by 2. We have
In our next example, we will find the class widths and midpoints in a group in a given grouped frequency table.
Example 4: Identifying Class Parameters of a Grouped Frequency Table
The temperatures of a room at noon to the nearest degree Celsius for a month are given in the table.
Temperature () | Frequency |
---|---|
14 | 3 |
15 | 4 |
16 | 5 |
17 | 7 |
18 | 7 |
19 | 3 |
20 | 2 |
- By rewriting each of the rounded temperatures as a range of possible temperatures, determine the class width of all of the classes.
- What is the midpoint of the final class?
Answer
Part 1
We first recall that the class width is the difference between the upper and lower class boundaries. It might be tempting to say that the class width is , since the table does not list differing upper and lower class boundaries. However, we are told that the temperatures are given to the nearest degree Celsius. This means that a temperature of will round up to , so it would be included in this row of the table. We can also note that any temperature below will round down to .
If we call the temperature variable , then we can rewrite the table as follows.
Temperature () | Frequency |
---|---|
3 | |
4 | |
5 | |
7 | |
7 | |
3 | |
2 |
This highlights that the fact that the difference between the upper and lower class boundaries in every class is . For example,
Part 2
We recall that the midpoint of a class is the average of the class boundaries. Therefore, we can find the midpoint of the final class by adding the upper and lower class boundaries together and dividing by 2. We have
In our final example, we will determine the class boundaries, the class width of a group, and the midpoint of another group in a given grouped frequency table.
Example 5: Identifying Class Parameters of a Grouped Frequency Table
The inclined angle that causes a box to slip down a slope is measured and the data is given in the table.
Inclined Angle, () | Frequency |
---|---|
2 | |
3 | |
4 | |
4 | |
2 |
- What are the boundaries of the fourth class?
- What is the width of the fifth class?
- What is the midpoint of the second class?
Answer
Part 1
We recall that the class boundaries are the largest and smallest values for the class. The third class contains 6 members, all of which have inclined angles in the following interval:
We see that all of the angles in this class must be less than , so we call this the upper class boundary. Similarly, we see that all of the lengths in this class must be greater than or equal to , so we call this the lower class boundary.
Hence, the class boundaries are and .
Part 2
We recall that the class width is the difference between the upper and lower class boundaries. The boundaries of the fifth class are and . Therefore, the width of the fifth class is given by
Part 3
We recall that the midpoint of a class is the average of the class boundaries. Therefore, we can find the midpoint of the second class by adding the upper and lower class boundaries together and dividing by 2. The upper and lower class boundaries are given by the greater and lower boundaries of the class: . We see that these are 45 and 40 respectively.
We have
Let’s finish by recapping some of the important points from this explainer.
Key Points
- Variables refer to the characteristics or features of an individual. Observed values of variables are called data.
- A qualitative variable is recorded in the form of words rather than numbers.
- A quantitative variable is a variable that is measured numerically. There are two
types of quantitative variables:
- Discrete variables can only take certain values in a given range.
- Continuous variables can take any value in a given range.
- We can summarize data in both frequency tables and grouped frequency tables.
- In a grouped frequency table, we group a range of values into intervals and record
the frequency with which data falls within certain intervals (called classes).
- Within each class, we call the greater of the two bounds in the class the upper class boundary and the lower of the two bounds in the class the lower class boundary.
- The class width is the difference between the upper and lower class boundaries, and the class midpoint is the average of the class boundaries.