### Video Transcript

When Trends Collide, the Uncertainty principle of Revision Classes.

Some years ago, when I was teaching, we ran a series of revision classes for students preparing for their mathematics exams. It seemed obvious that this would be useful as the more you practice, the better you get. Then we decided to do a little bit of analysis and discovered something that seemed really strange. At first, we assumed we’d calculated something incorrectly, but can you work out what was actually going on? There were 150 students taking the exam that yeah and the revision classes were optional. So only those who felt they needed to attend did so. 75 students attended the classes. And of those, 60 passed the exam. That’s a success rate of 80%. 75 students didn’t attend the classes. And of those, 56 pass the exam. That’s a success rate of 74.7%.

The teachers all patted ourselves on our backs and said what a great job we’ve done and showed the figures to the next year’s students. We even added a bit of flourish into our explanation. If you think about it, the students who attended the revision classes were generally the ones who were less confident about their maths. You’d expect more of them to fail the exam, but in fact more of them passed. So the revision classes were obviously making a big difference. All students should attend the revision classes. We produced a nice comparative bar chart to illustrate this fact. And we even resisted the temptation to distort the scale to exaggerate the effect of the sessions.

Now it just so happened that 75 of the students identified themselves as girls and 75 identified themselves as boys. And 50 girls but only 25 boys attended the sessions. We knew that more girls passed their exam than boys. So we decided to break their numbers down by boys and girls hoping to gain more insight into the difference the revision sessions were making. For the girls, 45 out of the 50 who attended the revision sessions passed their exam. That’s a 90-percent pass rate. But then we discovered that 23 of the 25 that didn’t attend passed their exam. And that’s a 92-percent pass rate. So a higher percentage of the girls who didn’t attend the revision sessions passed than those who did attend.

This was uncomfortable, but maybe it was due to the fact that the 25 girls who didn’t bother to attend the sessions have more self-awareness about their abilities and didn’t need to attend. Maybe the positive effect of the revision sessions was even more applicable to boys, which could be a powerful message, since only half as many boys as girls had attended the sessions. And a larger number had more to gain. So we looked at the numbers for boys. 15 out of the 25 boys who attended the revision session passed their exam, which is a 60-percent pass rate. But 33 out of the 50 boys who didn’t attend the revision session passed their, exam which is a 66-percent pass rate. This seems like a paradox. Boys who attended the revision sessions had a lower pass rate than those who didn’t, and girls who attended the revision sessions had a lower pass rate than those who didn’t. But overall, the revision class had a higher pass rate than the nonrevisers.

We checked the numbers, and they were right! Superficially, we had a situation a bit like Werner Heisenberg’s uncertainty principle from quantum mechanics, but applied to revision classes. If we don’t measure a student’s gender, then we know that the revision class will improve their chances of passing a maths exam. But as soon as we know an individual’s gender, then their chances of passing the exam decrease. If you want to get the maximum value from the revision class, then don’t let us know whether you’re a girl or a boy. How could this be? It’s clearly nonsense. It turns out that this statistically strange situation is relatively uncommon, but it does happen from time to time. And it’s known as Simpson’s paradox. Some people call it Simpson’s reversal or the Yule- Simpson effect, the amalgamation paradox, or the reversal paradox.

It’s when a consistent trend appears in some data when you look at individual groups separately, but a reversal of this trend appears when you combine all the data together. In fact, in situations like ours when we have three variables each with two possible outcomes at different rates — boy, girl; pass, fail; attend revision, don’t attend revision — it turns out that if we just chose numbers at random, then we get the Simpson’s paradox about a sixtieth of the time. So what’s going on? Well in our case, we were concentrating very much on whether we thought the revision classes were improving outcomes for students. But if we just look at the data, there’s a huge difference in the pass rates for boys and girls, which overwhelms any effect of the revision classes.

Overall, 116 out of 150 students pass the exam, which is a pass rate of 77.3%. There are equal numbers of boys and girls, but only 64% of the boys pass the exam while 90.7% of the girls pass the exam. If we split up the students into two equal-sized groups at random, it seems pretty clear that the group with more girls in it is probably going to have a higher pass rate because many more girls pass their exams than boys. Here are the same numbers labelled group A and group B rather than did not attend revision and attended revision. And let’s say that the students were randomly allocated to the two groups and forget for the moment that there were any revision classes.

Thinking that the group allocations were made at random, we’re not distracted by preconceptions about revision affecting outcomes. It might at first glance seem a bit strange that girls and boys separately in group A had higher pass rates than the girls and boys separately in group B. But overall, group B had a higher pass rate. However, we can quite easily say that it’s because group B has more girls in it, and the girls did a lot better than boys in the exams overall. If we now introduce the idea that students were not in group A or B randomly, but because they had different revision regimes, then we’d already be primed to spot the huge difference in pass rates between boys and girls and that the total pass rate for each group is influenced by how many boys or girls are in each.

This would prompt us to consider boys and girls separately, and we’d be focusing on the observation that attending the revision sessions seems to be associated with lower pass rates for boys and for girls. And we might then start looking whether these differences are statistically significant given our relatively small sample size. This seems a more sensible way of approaching the analysis. Here are two groups. Can we find any evidence that one group performs differently to the other given they’re a mix of boys and girls? At this point, I’d like to confess that I made up these numbers, and this is supposed to be an illustrative example which in no way represents the actual revision sessions I’ve ever been involved with.

Well, I didn’t make up the numbers. They’re all real numbers, just not from revision classes. And I carefully chose them to make a point about causal analysis. Whoever you are, revision will definitely increase your chances of passing a maths exam. Another example of this Simpson’s paradox effect can be seen if we plot a scatter graph of some bivariate continuous data. Here we can scatter graph of people’s age versus their score on a special aptitude test. Now if we draw a line of best fit, it looks like there is some positive correlation. Although there’s a lot of individual variation, generally older people get higher scores on this test. But if we now look at some subgroups, we see different patterns emerge.

These people said their favourite color was green. And their data shows and negative correlation. Older people tend to get lower scores on the test. These people said their favourite colour was yellow. And again, their data shows negative correlation. Older people tend to get lower scores on the test, and these people said their favourite color was fuchsia. And again, their data shows negative correlation. So overall, the data shows positive correlation. But when we know people’s favourite color, we see negative correlation. The overall trend is different to the trend that each subgroup shows.

So, what have we learned in this video? Well basically we need to be very careful when trying to attribute a causal link between variables. If one thing looks like it causes another, then you need to be very careful about what you claim. Maybe it’s a coincidence. Maybe we’ve just got an idea in our head and we’re interpreting the data as confirming our idea. Maybe there’s another variable lurking in the background that’s causing the effect. And then again, maybe someone just made up the data. But whichever it is, good luck with your maths revision.