A data set can be summarized by the following: 𝑛 is equal to eight, the sum of the 𝑥-values in the data set equals 78, the sum of the 𝑦-values equals negative 73, the sum of the product of the 𝑥- and 𝑦-values equals negative 752, the sum of the 𝑥-values squared equals 792, and the sum of the 𝑦-values squared equals 735. Calculate the product-moment correlation coefficient for this data set, giving your answer correct to three decimal places.
Okay, so in this example, we have a set of data with eight data points. That’s what it means that 𝑛 equals eight. Each of these points has an 𝑥- as well as 𝑦-value. So we’re talking about a bivariate data set. If we add up all eight of the 𝑥-values — that’s what this Greek letter ∑ indicates, a sum — then we get a total of 78. Likewise, we can add up all of the 𝑦-values and get this answer, and so forth for the rest of these sums. It turns out that all of this information will be useful to us in calculating this product-moment correlation coefficient.
Another name for this is the Pearson correlation coefficient. This coefficient is a number that indicates just how well the one variable in the data set correlates with the other. The correlation coefficient in general can be as low as negative one and as great as positive one. These coefficient values represent perfect inverse correlation or perfect direct correlation, respectively. And then a Pearson correlation coefficient of zero indicates that the variables are not correlated with one another at all.
It’s possible to calculate mathematically this correlation coefficient, sometimes called 𝑟 or 𝑟 sub 𝑥𝑦. It’s given by this expression. In our numerator, we have the sum of the product of the 𝑥- and 𝑦-values of our data set. And from this, we subtract the number of points in the data set multiplied by the average 𝑥-value and the average 𝑦-value. Then in the denominator, we have the square root of the sum of the 𝑥-values in our data set squared minus the number of points in the data set multiplied by the average 𝑥-value squared all multiplied by the square root of the same thing for the 𝑦-values.
Considering this equation overall, notice that we could exchange 𝑥 with 𝑦 and 𝑦 with 𝑥 and 𝑟 sub 𝑥𝑦 would come out the same. That is, given a set of bivariate data — data with two variables — whether we call the one variable 𝑥 and the other 𝑦 or 𝑦 then 𝑥 makes no difference as far as the Pearson correlation coefficient is concerned. Anyway, we’ve been given enough information to calculate 𝑟 sub 𝑥𝑦 for this given data set.
Before we plug these values in though, let’s note that the average 𝑥-value of a data set equals the sum of all the 𝑥-values divided by 𝑛, the number of data points. And similarly, the average 𝑦-value of the data set is given by this expression. We can write our product-moment correlation coefficient or Pearson correlation coefficient this way then. And then we substitute in all the given information.
We have the sum of the product of 𝑥 and 𝑦, the number of data points in the set, the average value of 𝑥, the average value of 𝑦, as well as the sum of the 𝑥-values squared and the sum of the 𝑦-values squared. When we enter this whole entire expression in our calculator, we get a result which to three decimal places is negative 0.864. We see then that these data points are inversely correlated and that this correlation is not far away from the extreme value of perfect inverse correlation. Negative 0.864 then is the product-moment or Pearson correlation coefficient of this data set.