Lesson Video: Correlation Mathematics • 8th Grade

In this video, we will learn how to deal with linear correlation and distinguish between different types of correlation.

11:32

Video Transcript

In this video, we’ll learn how to deal with linear correlation and distinguish between different types of correlation. Let’s think about what happens when we plot a scatter diagram. A scatter diagram can be used to represent bivariate data, where one set of data is paired with another set of data. For instance, we might look to plot the daily precipitation in New York City versus fried-chicken sales in pounds. Looking at the scatter diagram, there appears to be a pattern, or trend. In this case, as the daily precipitation increases, so do the fried-chicken sales. In this case, we might say that these two data sets have a correlation, meaning there appears to be some sort of relationship between them.

It’s worth noting though that whilst we might appear to find correlation, that doesn’t necessarily mean that causation exists. In other words, we cannot necessarily assume that daily precipitation actually causes fried-chicken sales to increase.

Now, with that in mind, let’s fully define the word correlation. We say that two data sets are correlated when there appears to be a relationship between them. We can use a scatter diagram to identify whether this correlation exists. Now, more specifically, if we plot these points on a scatter diagram and they mainly appear to lie along a straight line, then they’re said to be linearly correlated. Similarly, if they follow some nonlinear trend, such as a curve or a logarithmic trend, then they’re said to be nonlinearly correlated. And, of course, if no such trend exists, there’s said to be no correlation.

Consider the linear correlation we discussed. A scatter diagram showing two variables that are linearly correlated might look a little bit like this. Similarly, it could look a little something like this. The data points in either case appear to lie approximately along a straight line. In our second example, the points might look a little something like this. In this case, the line of best fit is a curve. Finally, if there is no correlation, our scatter diagram might look a little something like this. In each of these cases, we’ve considered whether we can actually draw a line of best fit through each of our points. The shape of the line of best fit then tells us information about the type of correlation, if it exists.

So, with this in mind, let’s look at how to compare a line of best fit with data on a scatter diagram. And this will help us determine whether the data is linearly correlated.

Can we use the line of best fit to describe the trend in the data? Why?

And then we have a scatter diagram with a line of best fit drawn. Let’s imagine this supposed line of best fit wasn’t drawn on the diagram. How would we construct our own line of best fit? How would we find a line that more accurately describes the trend in the data given by the blue points? Well, it might look a little something like this. Yes, as the values of 𝑥 increase, the values of 𝑦 also increase. But we can see that this is not necessarily in a straight line. This means 𝑥 and 𝑦 do appear to be correlated. But we would say they are nonlinearly correlated. The line of best fit is not a straight line.

And so this would not be a sensible line of best fit to describe the trend in the data. We certainly wouldn’t want to use this line of best fit to make predictions or estimates based on the data we’re given, and the reason being is because this data is not linearly correlated. It doesn’t approximately follow a straight line.

Now, whilst this wouldn’t be a sensible line of best fit to describe the trend in the data, we did say that both the line of best fit and the apparent trend in the data show that as the values of 𝑥 increase, the values of 𝑦 also appear to increase. And there are some phrases we can use to describe this. We say that two data sets are positively correlated, or directly correlated, if one data set increases as the other increases. In the case of positive linear correlation, the data points might look a little something like this. If data sets are negatively correlated, or inversely correlated, then as one set increases, the other will decrease, and vice versa. In the case of two data sets that have negative linear correlation, the points appear to follow a line which slopes downwards, as we see.

So, with this in mind, let’s determine whether data is positively or negatively correlated or not at all correlated using a line of best fit.

What type of correlation exists between the two variables in the scatter plot shown?

When we think about correlation, we think about linear correlation — in other words, points that approximately follow a straight line — we think about nonlinear correlation — these are points that might follow a different type of trend, for example, a curve. And if things are linearly correlated, we say that they can be positively linearly correlated or negative linearly correlated, depending on the direction of the line of best fit. So, let’s consider the graph we’ve been given here and see if we can draw a line of best fit.

The line of best fit, of course, does not need to go through the origin, the point zero, zero, although here it does appear that it might. And that line of best fit should roughly follow the trend of our points. We might now notice that our line of best fit slopes upwards. In other words, it has a positive slope. So this tells us that as the values of 𝑥 increase, so do the values of 𝑦. In this case then, the variables 𝑥 and 𝑦 are positively correlated. Specifically, since these points also approximately follow a straight line, we can say that the correlation is linear. And so we fully answered the question. The type of correlation that exists is positive linear correlation.

Now, in this example, we were given a scatter diagram of a data set. This might not always be the case. We might instead be given a description of the type of variables. As we’ll now see, we’ll then need to use our understanding of how variables relate to one another as a way of determining whether they are positively or negatively correlated or not correlated at all.

Suppose variable 𝑥 is the number of hours you work and variable 𝑦 is your salary. You suspect that the more hours you work, the higher your salary is. Does this follow a positive correlation, a negative correlation, or no correlation?

We’re told that variable 𝑥 is the number of hours worked, whilst variable 𝑦 is the salary. And we’re looking to find a relationship, if it exists, between these two variables. Now, in fact, the suspicion is that the more hours you work, the higher your salary is. So, let’s attempt to plot this on a scatter graph. Variable 𝑥 is the number of hours worked, whilst 𝑦 is the salary, so we can label the axes as shown. Let’s make up some starting figures. Let’s imagine that if you work 15 hours, you earn 20,000 pounds. You might then assume that if you work 30 hours a week, you earn an annual salary of 40,000 pounds. Assuming that the more hours you work, the higher your salary is, we could add extra points on our scatter graph as shown.

We notice that the points plotted approximately follow a straight line and that this straight line has a positive slope. It slopes upward. Since this line slopes upwards, we can say that the two variables 𝑥 and 𝑦 must have positive correlation. Now, we also assumed that this was positive linear correlation, but that might not be the case. We only know that the higher the number of hours, the higher the salary, which means that this is an example of positive correlation.

Now, in this example, we modeled our data points as lying very closely to some straight line. The distance that the data points actually lie relative to a line of best fit describes the strength of the correlation. For instance, suppose we’re interested in positive linear correlation. If all the points lie very close to the line of best fit, as in this example, we can say that’s an example of strong correlation. If, however, the points are quite far away from the line of best fit, as in this example, then we say that there is weak correlation. Of course eventually this weak correlation turns into no correlation as the points get further and further away from one another. With this in mind, let’s determine the strength of correlation in our next example.

State which of the scatter diagrams shows bivariate data with a stronger correlation.

And then there are two diagrams to choose from. Remember, when we think about the strength of a correlation, we’re determining how close the points are to a line of best fit. The closer the points are, the stronger the correlation. So, it makes sense to begin by drawing the line of best fit on both of our diagrams. The line of best fit on diagram one might look a little something like this. The points approximately follow a straight line, so there is linear correlation here. Specifically, as the 𝑥-variables increase, so do the 𝑦. So, we can say that 𝑥 and 𝑦 are positively linearly correlated.

In diagram two, our line of best fit looks quite similar. But we notice that all of the points are a little bit further away from the line itself. This means in diagram two, the correlation is less strong. We might say it’s weak. And so the answer is diagram one. The scatter diagram one shows bivariate data with a stronger correlation.

We’ve now looked at how two different variables can be related and what it means for them to have a linear or nonlinear relationship. We’ve considered how to describe the relationship between variables in terms of positive, negative, or no correlation. And we’ve looked at how strongly correlated variables are based on how close they are to a line of best fit. With all this in mind, let’s recap the key points from this lesson.

In this video, we learned that if two variables follow a trend of some description, they’re said to be correlated. If we model these points on a scatter diagram and they appear to follow approximately a straight line, then linear correlation exists. Then, if the line of best fit constructed appears to slope upwards, in other words, its slope is positive, then they have positive correlation. And if that line of best fit slopes downwards, if it has negative slope, then those variables are said to be negatively correlated. Now, if neither of these is true, in other words, if a line of best fit cannot be constructed, then we said that there was no correlation. Finally, we saw that we can determine the strength of the correlation by considering how close all of the points lie to the LOBF, the line of best fit.

Nagwa uses cookies to ensure you get the best experience on our website. Learn more about our Privacy Policy.