In this explainer, we will learn how to deal with linear correlation and distinguish between different types of correlation.

You may recall learning about two data sets when plotting scatter diagrams.
Known as **bivariate data**, one set of data is paired with the other set
of data and these can be plotted on a scatter diagram, with the first data
being plotted on the -axis and the second
on the -axis.

### Definition: Bivariate Data

Bivariate data is when one set of data is paired with another set of data.

For example, we might measure people’s heights and head circumferences and plot these on a scatter diagram as seen below.

Looking at the scatter diagram, there is a pattern or trend.
In the case of heights and head circumferences, as height increases,
head circumference appears to increase (although not always). In this case,
we say that the two data sets have a **correlation**, meaning there is
a relationship between them.

### Definition: Correlation

Two data sets (bivariate data) have a correlation when they have a relationship with each other or follow a trend.

Depending on the relationship between the data sets, there may be different ways to describe how they are related.

Firstly, if two data sets follow a straight line when plotted on a scatter
diagram, then they have a **linear correlation**. If they follow a different
pattern that is not a straight line, then they have a **nonlinear correlation**.
However, if the data points do not seem to follow a trend at all, then they have
**no correlation**.

### Definition: Linear and Nonlinear Correlation

- Two data sets are linearly correlated if they follow a straight line.
- Two data sets are nonlinearly correlated if they follow a nonlinear trend such as an exponential or a logarithmic trend.
- Two data sets are not correlated if they do not appear to follow a trend.

The different types of linear and nonlinear correlation can more easily
be seen by considering the scatter diagrams and trying to
draw a **line of best fit**.

By comparing the line of best fit to the data points on the first scatter diagram, we see that the data points seem to follow a linear trend and so have a linear correlation. By comparing the line of best fit to the data points on the second scatter diagram, we see clearly that they do not follow a linear trend but do appear to follow an exponential trend and so have a nonlinear correlation. For the third scatter diagram, there is no clear trend (linear or nonlinear), which can be seen by trying to draw a line of best fit. As the points are scattered and do not follow the line of best fit or any other trend, then they have no correlation.

We will use the approach of comparing a line of best fit with data on a scatter diagram as a way of determining whether data is linearly correlated in the following example.

### Example 1: Determining Whether Data Shows a Linear Trend

Can we use the line of best fit to describe the trend in the data? Why?

### Answer

As can be seen from the scatter diagram, the data does not follow a linear trend as the points do not follow the line of best fit. As the data points do appear to follow a trend (and, in this case, an exponential one) then they have a nonlinear correlation.

Secondly, how one data set changes in relation to the other determines
whether it has a **positive** or **negative correlation**.
If one data set increases as the other increases, then the data is said to
have **positive** or **direct correlation**. However, if one data set
increases as the other decreases, then the data is said to have
**negative** or **inverse correlation**.

### Definition: Positive and Negative Correlation

- Two data sets are positively, or directly, correlated if one data set increases as the other increases.
- Two data sets are negatively, or inversely, correlated if one data set increases as the other decreases.

The different types of positive and negative correlation can more easily be seen from a line of best fit, as, for a positive correlation, the slope of the line of best fit is positive and, for a negative correlation, the slope is negative.

We will consider how to determine if data is positively or negatively correlated (or not at all correlated) using a line of best fit on a scatter diagram in the following example.

### Example 2: Determining the Type of Correlation of Data from a Scatterplot Diagram

What type of correlation exists between the two variables in the scatterplot shown?

### Answer

In order to determine the type of correlation, we need to see how one data set changes in relation to another. To help us do this, we can draw a line of best fit on the graph.

As the values of the -coordinates increase, so do the values of the -coordinates, meaning there is a positive correlation. We can see this from the line of best fit, as it has a positive slope (as the line goes from left to right, it goes up).

Sometimes, we are not given the scatter diagram of a data set but a description of the types of variables. As seen in the next example, we need to use our understanding of how variables relate to one another as a way of determining whether they are positively or negatively correlated (or not at all correlated).

### Example 3: Determine the Type of Correlation of Two Variables from a Description

Suppose variable is the number of hours you work, and variable is your salary. You suspect that the more hours you work, the higher your salary is. Does this follow a positive correlation, a negative correlation, or no correlation?

### Answer

Here, it is important to consider how variables change. In this case, as the number of working hours increases, the salary increases. This means that they have a positive correlation. It might be helpful to sketch a graph to show what this would look like.

Since the line is going up as it goes from left to right, it has a positive slope and, therefore, shows a positive correlation.

Thirdly, we can determine how strongly linearly correlated two data sets
are depending on how closely they follow a straight line or a line of best fit.
If all the data points lie close to a line of best fit,
then they have a **strong correlation**. If many of the data
points are farther away from the line of best fit, then they have a
**weak correlation**. If they do not follow a line of best fit at all,
then they have **no correlation**.

The strength of correlation can be seen more clearly when we compare the data points on a scatter diagram to a line of best fit. This can be seen in the scatter diagrams below.

We can use our scatter diagrams and the line of best fit to help determine the strength of correlation, as seen in the following example.

### Example 4: Determining the Strength of Correlation of Data from the Relationship with the Line of Best Fit

State which of the scatter diagrams shows bivariate data with a stronger correlation.

### Answer

In order to help determine which diagram shows data with a stronger correlation, we can draw a line of best fit.

In diagram 1, the data points are all close to the line of best fit, indicating that there is a strong correlation between the data sets. In diagram 2, some of the points are close to the line of best fit, but other points are farther away, indicating there is a weak correlation. Therefore, diagram 1 shows a stronger correlation.

In the next example, we can use the description of what the data would look like on a scatter diagram to also determine the type of correlation.

### Example 5: Determining the Strength of Correlation of Data from the Relationship with the Line of Best Fit

If the data points all line up on the line of best fit, what does this say about the data?

### Answer

If the points are closer to the line of best fit, then the data has a stronger correlation between the variables. Therefore, if the data points all line up on the line of best fit, then there must be a very strong correlation.

In this explainer, we have discussed different types of correlation, including linear and nonlinear correlation as well as positive and negative correlation, and the strength of correlation. We have used lines of best fit to help us determine the type of correlation as well as the strength of linear correlation between bivariate data.

### Key Points

- Correlation describes how two variables are related to one another.
- Linear correlation means that a set of bivariate data follows a linear trend (line of best fit).
- Positive correlation means that as one variable increases, so does the other variable.
- Negative correlation means that as one variable increases, the other decreases.
- Strong correlation means that the data points closely follow a line of best fit.
- Weak correlation means that the data points are more distant from the line of best fit.
- No correlation means that the data points do not follow a line of best fit.