In this explainer, we will learn how to deal with the concept of conditional probability using joint frequencies presented in two-way tables.
When collecting data on nonnumerical variables, we count how many times a particular characteristic occurs. We can then put our results in a table.
For example, a school may collect data on the number of pupils who travel to school by car, by foot, and by bicycle, as shown in the table below.
Mode of Transport | Car | Foot | Bicycle |
---|---|---|---|
Number of Pupils | 450 | 650 | 200 |
Our population here is the number of pupils and the variable is the mode of transport, which is a categorical (i.e., nonnumerical) variable. In this data set, the variable has 3 categories: car, foot, and bicycle. We count the number of pupils who take each mode of transport.
We can delve deeper into the data by splitting it according to how many boys or girls took each mode of transport. So, the data varies not only across the mode of transport, but also with respect to whether the pupil is a boy or a girl, as shown below.
Mode of Transport | Car | Foot | Bicycle |
---|---|---|---|
Boys | 200 | 330 | 120 |
Girls | 250 | 320 | 80 |
Our data is now displayed in a two-way table (which is sometimes also called a contingency table). The two in the two-way table refers to the two variables (which, in our case, are modes of transport and being a boy or a girl). Looking at the table, we can see that, for example, there were 200 boys who traveled by car, but only 120 boys who traveled by bicycle.
We can use two-way tables to calculate the probability of an event occurring as well as the conditional probability of an event occurring given, another event has occurred. To illustrate how this works, we will first recall the formula for conditional probability.
Definition: Conditional Probability
The probability that an event occurs, given that event has already occurred, is where is the probability that occurred, given occurred, is the probability that and occurred (happened) at the same time, and is the probability that occurred.
Using the example above, we will discuss how to use a two-way table to find conditional probabilities.
If we want to find the probability that a pupil traveled by car given that they are a girl, then by replacing event with the event that they travel by car and event with the event that they are a girl we get
To calculate the probability of selecting a girl, we need to find the total number of girls and divide it by the total number of pupils. We can do this by summing the row for the girls and summing all the rows, or all the columns, to get the total number of pupils. Generally, it is helpful to calculate all the totals of the rows, columns, and overall total, as a first step when calculating probabilities from two-way tables.
Mode of Transport | Car | Foot | Bicycle | Total |
---|---|---|---|---|
Boys | 200 | 330 | 120 | 650 |
Girls | 250 | 320 | 80 | 650 |
Total | 450 | 650 | 200 | 1βββ300 |
So, we need now the total number of girls and total number of pupils, as circled on the table below.
So, the total number of girls is 650, and the total number of pupils is 1βββ300. So, to calculate the probability that a pupil is a girl, meaning the probability of selecting a girl, we write the following:
Next, we find the probability that the pupil selected travels by car and is a girl. We can use the two-way table to find the number of pupils who travel by car and are girls, then divide it by the total number of pupils.
So, the number of girls who travel by car is 250, and the total number of pupils is 1βββ300, which means that the probability of selecting a girl who travels by car is
Therefore, by substituting and in the formula for conditional probability we get
Therefore, the probability of traveling by car given that the pupil is girl is .
Notice that, when calculating the probability of traveling by car given that the pupil is a girl, the total number of pupils, 1βββ300, cancels out. This is because we can simply use the table to find the probability of traveling by car, given that a pupil is a girl, by finding the number of girls who travel by car and dividing it by the total number of girls (since it is given that they are a girl, we only consider the girls as the total). For the rest of the explainer, we will use this approach.
Using the table, we get:
So, the number of those who travel by car is 250 and the total number of girls is 650, which means that the probability that a pupil travels by car given that they are a girl is
We will explore this approach further in the next example.
Example 1: Calculating a Conditional Probability from a Two-Way Frequency Table
The two-way table shows the ages and activity choices of a group of participants at a summer camp.
Swimming | Climbing | Rappelling | |
---|---|---|---|
14 and Under | 15 | 24 | 8 |
Over 14 | 18 | 32 | 24 |
A child is selected at random. Given that they chose rappelling, find the probability that the child is over 14.
Answer
To find the probability that a child is over 14 given that they chose rappelling, it is helpful to first calculate the totals of the rows and columns in the table.
Swimming | Climbing | Rappelling | Total | |
---|---|---|---|---|
14 and Under | 15 | 24 | 8 | 47 |
Over 14 | 18 | 32 | 24 | 74 |
Total | 33 | 56 | 32 | 121 |
Next, as we are calculating the probability that a child is over 14 given that they chose rappelling, we want to find the number of children over the age of 14 who chose rappelling and divide this by the total number of children who chose rappelling. This can be seen below in the table.
So, the number of children over the age of 14 who chose rappelling is 24, and the total number of children who chose rappelling is 32. Dividing these in order to find the probability that a child is over 14 given that they chose rappelling is
Therefore, the probability that a child selected at random is over 14 given that they chose rappelling is , or .
In the following example, we will consider how to find the probability of an event and the probability of a conditional event.
Example 2: Conditional Probability from a Two-Way Table
The table below contains data from a survey of core gamers who were asked whether their preferred gaming platform was the smartphone, the console, or the PC. The gamers are split by gender.
- Find the probability that a core gamer chosen at random prefers using a console. Give your answer to three decimal places.
- Given that a core gamer prefers to play using a console, find the probability that they are male. Give your answer to three decimal places.
Answer
Let us first work out the totals for the rows and columns of our table.
Part 1
To find the probability that a core gamer chosen at random prefers using a console, we find the number of gamers who prefer a console and divide by the total number of gamers.
Let C be the number of gamers who prefer a console; then,
As a percentage, this is . Hence, approximately of gamers prefer to use a console.
Part 2
Given that a core gamer prefers to play using a console, we want to find the probability that they are male. Because we are only now interested in gamers who prefer a console, those who prefer to use smart phones or PCs do not figure in this calculation. So we only need to look at the βconsoleβ row in the table (highlighted in blue).
Our conditional probability, , the probability that a gamer chosen at random is male given that they prefer a console, is then
As , we can say, given that a gamer chosen at random prefers a console, that there is approximately a chance that they are male.
In the next example, we will consider how to use a two-way table to find a conditional probability where the given condition is more than two sets of categories.
Example 3: Calculating Conditional Probabilities Using a Two-Way Table
Two boxes contain a number of defective, partially defective (failing after a couple of hours of use), and acceptable light bulbs.
The numbers are given in the table.
Box 1 | Box 2 | |
---|---|---|
Defective | 12 | 3 |
Partially Defective | 3 | 22 |
Acceptable | 25 | 40 |
A light bulb is chosen at random and put to use. If it does not fail immediately, what is the probability that it is chosen from box 2? Round your answer to three decimal places.
Answer
When working with two-way tables it is helpful to find the totals of the rows and columns first, as follows.
Box 1 | Box 2 | Total | |
---|---|---|---|
Defective | 12 | 3 | 15 |
Partially Defective | 3 | 22 | 25 |
Acceptable | 25 | 40 | 65 |
Total | 40 | 65 | 105 |
We are asked if a light bulb does not fail immediately, then we are asked what is the probability that it is chosen from box 2. We need to be careful with categories here, as defective means it fails immediately, partially defective means it fails after a couple of hours, so not immediately, and acceptable means it does not fail. That means that it does not fail immediately if it is acceptable or partially defective. So, we are trying to find the probability that a bulb is chosen from box 2 given that it is either acceptable or partially defective.
To find the probability that a bulb is chosen from box 2 given that it is either acceptable or partially defective, we need to first find how many bulbs are chosen from the box that are either acceptable or partially defective. We can do this by using the table and finding the number of bulbs that are from box 2 and acceptable and the number of bulbs that are from box 2 and partially defective, then adding these together.
Therefore, the number of bulbs that are chosen from box 2 and are either acceptable or partially defective is .
Next, to find the probability that a bulb is chosen from box 2 given that it is either acceptable, or partially defective, we need to find the total number of bulbs that are either acceptable or partially defective. Again, we will use the table to find the total number of bulbs that are acceptable and the total number of bulbs that are partially defective, then add them together.
Therefore, the total number of bulbs that are either acceptable or partially defective is .
Now, we calculate the probability that a bulb is chosen from box 2 given that it is either acceptable or partially defective by dividing the number of bulbs chosen from box 2 and are either acceptable or partially defective by the total number of bulbs that are either acceptable or partially defective. This gives us
In the following example, we will discuss a question where information not presented in a table is first put in a table and then a conditional probability is found.
Example 4: Two-Way Tables, Conditional Probability, and the Relationship between Categorical Variables
In a group of 96 people, 34 out of the 71 women have a smartphone, and 18 men do not have a smartphone. Determine the probability that a randomly picked smartphone owner in this group will be female.
Answer
We are given some information about 96 people that can be classified into two sets of categories: whether they are men or women and whether they own a smartphone or not. Using a two-way table, we can input the information we know and then work out any missing information.
As we are told that there are 96 people, this number goes in the overall total in the bottom right of the table. Since we know that 34 out of 71 women own a smartphone, then we know that there are 71 women in total; this goes at the bottom of the column for women. We also know that 34 are smartphone owners, this goes in the cell in the row of smartphone owners and the column of women. We also know that 18 men do not have a smartphone, so this goes in the row of not smartphone owners, and the column of men.
Men | Women | Total | |
---|---|---|---|
Smartphone Owners | 34 | ||
Not Smartphone Owners | 18 | ||
Total | 71 | 96 |
We can deduce the other information by adding or subtracting different amounts to fill in blank cells. First, we can work out how many women do not own a smartphone by subtracting 34 from 71, which gives us 37.
Second, we can find the total number of those who do not own smartphones by adding the number of men, 18, and the number of women, 37, who do not own smartphones. This gives us 55.
Third, we can find the total number of men by subtracting the total number of women, 71, from the total number of people, 96, giving us 25.
Fourth, we can find the number of men who own a smartphone by subtracting the number of men who do not own a smartphone, 18, from the total number of men, 25, giving us 7.
Lastly, we can calculate the total number of smartphone owners by adding the number of men who own a smartphone, 7, and the number of women who own a smartphone, 34, which gives us 41.
We can add the total number of smartphone owners, 41 and the total number of those who do not own a smartphone, 55, to check our answers. This gives 96, which is the overall total.
Now that we have filled in all the information into the two-way table, we can calculate the probability that a randomly picked smartphone owner is a woman. In other words, we want to find the probability that a person selected is a woman given that they are a smartphone owner.
To calculate the probability that a person selected is a woman given that they are a smartphone owner, we want to divide the number of women who are smartphone owners by the total number of smartphone owners. We can find this information from the table.
So, the number of women who are smartphone owners is 34, and the total number of smartphone owners is 41. Calculating the probability that a woman is selected given that she is a smartphone owner, we get
Therefore, the probability that a randomly selected smartphone owner is a woman is .
In the last example, we will consider another case where the information presented is not in a table and require to be filled in the table and requires unknown information in order to calculate a conditional probability.
Example 5: Calculating Conditional Probabilities by Inputting into a Two-Way Table
A company manufactures a product in two different plants, and . The company supplies three customers, , , and equally, each with 80 units a month. produces 10 units of this product per month and the company distributes this amount among the three customers , , and in percentages of , , and respectively. If you select a unit at random from a outlet, find the probability that it is produced by .
Answer
As we have two different variables in this question, the two different plants and , and the three different customers , , and , we can use a two-way table to present the information as follows.
Total | ||||
---|---|---|---|---|
Total |
We are told that the company supplies each of the three customers with 80 units each per month. This means the total for the columns of , , and are each 80. It also means that the overall total is the sum of these, which is 240.
Total | ||||
---|---|---|---|---|
Total | 80 | 80 | 80 | 240 |
Next, we are told that produces 10 units per month, which means that the total for the row is 10. Since we know that the 10 units are distributed to , , and in percentages of , , and , then we know that the number of that has is of 10, which is 2, the number of that has is of 10, which is 3, and the number of that has is of 10, which is 5. We can then input this information in the row of .
Total | ||||
---|---|---|---|---|
2 | 3 | 5 | 10 | |
Total | 80 | 80 | 80 | 240 |
We can find the remaining information needed in the table by adding or subtracting rows and columns. We can find the amounts of that each of the customers ordered by subtracting from the total. This gives us for , for , and for . We can then input this information in the row of .
Total | ||||
---|---|---|---|---|
2 | 3 | 5 | 10 | |
78 | 77 | 75 | ||
Total | 80 | 80 | 80 | 240 |
To find the total of we can sum each amount that , , and ordered, giving us . We can see that, by subtracting the total of , 10, from the overall total, 240, that this is correct.
Total | ||||
---|---|---|---|---|
2 | 3 | 5 | 10 | |
78 | 77 | 75 | 230 | |
Total | 80 | 80 | 80 | 240 |
Now that we have found all of the information in the two-way table, we can calculate the probability that if a unit is selected from , it is produced by , or, in other words, the probability of selecting a unit produced by given that it is selected for . To calculate this, we need to calculate the number of units produced by for and divide it by the total number of units produced by . We find this information from the table as follows.
So, the number of units per month produced by for is 75, and the total number of units produced for is 80. Therefore, the probability of selecting a unit produced by given that it is selected for is
Therefore, the probability that a unit selected from a outlet is produced by is 0.9375.
In this explainer, we have learned how to use two-way tables to find conditional probabilities. Letβs recap the key points.
Key Points
- The two in two-way table refers to the fact that there are two variables under consideration.
- In a two-way table, we organize the counts, or frequencies, for the categories of two categorical variables.
- Values (or categories) of the row variable label the rows running across the table, and values (or categories) of the column variable label the columns running down the table.
- We use two-way tables to examine the relationship between two categorical variables. In particular, we can look at conditional probabilities.
- Conditional probabilities can be read directly from two-way tables.
- The probability of event given that event has occurred, , is a fraction where the denominator is the total for event and the numerator is the number of occurrences of :
- We can also use the conditional probability formula, where is the probability of both and occurring at the same time.