CORREL Function

Calculates the correlation coefficient between the dataset of two variables.

Author: Josh Pupkin
Josh Pupkin
Josh Pupkin
Private Equity | Investment Banking

Josh has extensive experience private equity, business development, and investment banking. Josh started his career working as an investment banking analyst for Barclays before transitioning to a private equity role Neuberger Berman. Currently, Josh is an Associate in the Strategic Finance Group of Accordion Partners, a management consulting firm which advises on, executes, and implements value creation initiatives and 100 day plans for Private Equity-backed companies and their financial sponsors.

Josh graduated Magna Cum Laude from the University of Maryland, College Park with a Bachelor of Science in Finance and is currently an MBA candidate at Duke University Fuqua School of Business with a concentration in Corporate Strategy.

Reviewed By: Elliot Meade
Elliot Meade
Elliot Meade
Private Equity | Investment Banking

Elliot currently works as a Private Equity Associate at Greenridge Investment Partners, a middle market fund based in Austin, TX. He was previously an Analyst in Piper Jaffray's Leveraged Finance group, working across all industry verticals on LBOs, acquisition financings, refinancings, and recapitalizations. Prior to Piper Jaffray, he spent 2 years at Citi in the Leveraged Finance Credit Portfolio group focused on origination and ongoing credit monitoring of outstanding loans and was also a member of the Columbia recruiting committee for the Investment Banking Division for incoming summer and full-time analysts.

Elliot has a Bachelor of Arts in Business Management from Columbia University.

Last Updated:December 9, 2023

What is the CORREL Function?

The CORREL function in Excel calculates the correlation coefficient between the dataset of two variables.

In simple terms, it measures the strength of a linear relationship between two variables. So, for example, the mileage of the cars would vary for different models and depend on the fuel you buy from the gas station.

If you buy 3 gallons of gas, car A can travel around 60 miles. On the other hand, car B might only travel 50 miles for the same fuel.

A correlation can be drawn between the amount of fuel and the mileage that the car travels and portray whether all the cars in the fleet show similar fuel consumption capabilities.

If there is a linear relationship between the two variables, it indicates that some correlation exists between the dataset.

This linear relationship is known as the correlation between the variables and can be calculated using this function.

In this article, we will see the syntax for the function, how to use it, and a couple of examples.

Key Takeaways

  • The CORREL function calculates the correlation coefficient between two variables represented by continuous data from a random population.
  • The function accepts two array arguments in the form of numbers. If logical or text values are supplied in the array, the function will ignore them.
  • If the size of array1 and array2 is not equal, the function returns the #N/A Error.
  • An alternative to the CORREL function is the PEARSON function, which calculates the correlation between the two variables.
  • The PEARSON function has similar arguments and works on similar conditions to return the correlation coefficient.
  • You might get a #DIV/0! Error if any reference arrays (array1 or array2) are empty or have a value of zero.
  • The value for the correlation coefficient lies between -1 and 1. When the value is near -1, it indicates there does not exist any correlation between the dataset of the two variables. For example, one data move increases while the other decreases.
  • When the value is near 1, it indicates that the two variables coordinate and move in the same direction. Finally, when the value is near 0, no correlation exists between the dataset of both variables.

Understanding CORREL function

The CORREL is categorized as a Statistical function that calculates the correlation coefficient for two sets of values from random populations.

The function is an upgrade to the PEARSON function, which returns the correlation coefficient based on the dataset of two variables. The only difference between both functions is that the latter ‘sometimes’ showed rounding-off errors.

The term ‘correlation coefficient’ is based on the Pearson correlation test, which determines the strength and direction of the variables giving the output to the user in the range of -1 to 1. 

A value of -1 indicates a perfect negative correlation, while a value of +1 indicates a perfect positive correlation. However, in practice, finding associations with perfect positive or negative correlations is quite difficult.

If the value equals zero, then the variables have no association.

The syntax for the function is

=CORREL(array1, array2)

where

  • array1 - (required) collection/set of independent values
  • array2 - (required) collection/set of dependent values

Note

The function ignores logical or text values supplied as a part of the array. If the lengths of array1 and array2 are different, you may get an #N/A Error.

Pearson Correlation Test

As stated earlier, a Pearson correlation test determines the strength and direction of the two variables and gives the result in the form of the Pearson correlation coefficient.

a. Positive Correlation

Let’s look at an example to understand the correlation test better. Suppose you have the fuel as an independent variable and mileage as a dependent variable, as illustrated below:

Excel Table

By representing the fuel on the X-axis and mileage on the Y-axis, our scatter plot looks as follows:

Excel Line Graph

As you can see, our scatter plot is almost linear, i.e., a line can be drawn through all the points on the graph. Therefore, this exhibits a linear relationship between both variables using a straight line.

Excel Line Graph 2

Since our straight line is moving in an upward direction, we say that there is a positive correlation between both variables, i.e., the value of the correlation coefficient is inclined towards +1.

When we use the CORREL function for this dataset, we get the value of 0.990867, meaning that both variables are ‘almost’ perfectly correlated.

b. Negative Correlation:

You can imagine a negative correlation if a straight line in an upward direction means a positive one.

Suppose you have the data as illustrated below:

Excel Table 2

The scatter plot for the data would look as below:

Scatter Plot

We see a downward trend in our dataset formed due to the intersection between the fuel on the X-axis and mileage on the Y-axis. When we draw a straight line through the points, we get

Scatter Plot with Straight Line

A downward straight line indicates negative correction, which, if calculated using the CORREL function, gives the value of -0.994376.

The value for the correlation coefficient would always lie between -1 and 1, most probably as a decimal number and quite ‘rarely’ as an integer on either extreme side.

c. Zero Correlation

If an upward and downward straight line indicates a positive and negative correlation, then what does zero correlation mean?

When a straight line cannot be drawn through the points on a scatter plot, it indicates zero correlation.

For example, suppose you have the data below:

Excel Table 3

The scatter plot for the given data would be as illustrated below:

Scatter Plot 2

All the points are randomly scattered on the plot, and potentially no line passes through more than two points. Thus, there is no correlation between the datasets, which is confirmed when we use the CORREL function to give the value of 0.237298.

As stated earlier, getting the coefficient values as -1, +1, or 0 is quite difficult. As the values returned are in decimals, we can interpret them by applying certain cutoff limits to the values to describe the strength of the correlation.

For example, if the value falls between 0.00-0.10, it represents no correlation, while a value between 0.90-1.00 indicates a powerful positive correlation between the variables.

Note

The closer the data points are toward the linear relationship, the stronger the correlation between the variables.

Correlation Coefficient formula

How would you calculate the correlation coefficient if neither the CORREL nor PEARSON functions existed in Excel?

In this case, we would have to use the below formula where x and y are the mean of the two arrays of values.

Equation

Let’s calculate the coefficient value manually to see what goes behind the scenes for the CORREL function. Let’s assume our data of four different cars with mileage per gallon of fuel as

Excel Table 4

First, we will calculate all the separate components from the formula in the table below:

The steps that we need to follow are:

  • Reference the values for fuel and mileage in the F and H columns, respectively.
  • Calculate the mean value for fuel and mileage using the AVERAGE function, which gives the average fuel (gallon) as five and mileage (miles) as 90, respectively.
  • Next, we subtract the value of x from its mean x and y from its mean y in the J and K columns.
  • Then, we calculate the square of (x - x̅) and (y - y̅) by using the formula =POWER(J3,2) and =POWER(K3,2) in columns L and M.
  • Finally, we multiply the equations (x - x̅)² and (y - y̅)² in column N. The calculations so far look, as illustrated below:

Now, all that’s left to do is calculate the sum of (x - x̅)², (y - y̅)², and (x - x̅)² * (y - y̅)² in cell L10, M10, and N10 respectively.

We will use the formula =SUM(L3:L6) in cell L10 and drag it to the right, which gives the result as

If we calculate the correlation coefficient using the CORREL function, we get the result as 0.990867389 in cell D9.

On the other hand, when we substitute all the values into =N10/(SQRT(L10*M10)), which is a simplified version of the coefficient correlation formula, we get:

In both instances, the result is equal to 0.990867389. Thus, you can use either of the two methods, but we surely know that you might prefer using the CORREL function.

Example of CORREL function

Finally, we see how to use the function in the spreadsheet. It’s not that difficult, and you just need to reference the array of values for both variables.

Suppose we have the age and income data, as illustrated below:

Excel Table 9

To get the correlation coefficient for the data, we will use the formula =CORREL(C3:C6, D3:D6), which gives the result of 0.474124.

Wasn’t that simple? Now that you know how easy it is to use the function, would you go through all those manual calculations?

We believe not. But yes, now you know both methods and how to calculate the coefficient. If you do not have access to the spreadsheet tool and need to make such calculations, you can even use a piece of paper to find the correlation coefficient.

Let’s see another example of calculating the coefficient using the CORREL function. Suppose you prepare the three-statement model for Nike Inc for the financial year end of 2019.

You project the Revenue and Taxes Paid as illustrated below:

Excel Table 9

By using the formula =CORREL(C3:C10, D3:D10) in cell G5, we get the result as -0.20155, implying a negative correlation between the two variables.

Excel Table 10

The taxes paid do not grow linearly with the Revenue. This means that NIke Inc has been utilizing its assets and liabilities with some efficiency so that taxes do not grow linearly with an increase in Revenue.

If you do not need recurring digits in the decimal number, you can use the ROUND function that limits the number to a specified number of digits after the decimal.

CORREL vs. PEARSON

Throughout the article, we had several mentions about how the correlation coefficient is derived from Pearson’s test and that there is a function called PEARSON in Excel, which also calculates the correlation coefficient r.

The PEARSON is categorized as a Statistical function that returns the correlation coefficient for two variables represented by continuous data from a random population.

There isn’t much difference between the functions. PEARSON has a similar syntax and returns the same result as CORREL. The only drawback with PEARSON was that it gave rounding errors in Excel versions before 2003.

However, the latest Excel versions have upgraded their calculation algorithms, so you can now use either function to calculate the correlation between two variables.

The syntax for the PEARSON function is

=PEARSON(array1, array2)

where

  • array1 - (required) collection/set of independent values
  • array2 - (required) collection/set of dependent values

Note

When logical or text values are supplied as a part of the array, they are ignored by the function. If the lengths of array1 and array2 are different, then the function returns the #N/A error.

Let’s see whether we get the same values for the correlation coefficient using both functions. Suppose you have the data for two variables, as illustrated below:

Excel Table 11

To calculate the correlation using the CORREL function, we will use the formula =CORREL(B3:B18, C3:C18) in cell F6, which gives the result 0.671095.

Similarly, we will use the formula =PEARSON(B3:B18, C3:C18) in cell F7, which also gives the result of 0.671095.

Excel Table 12

As you see, either function yields a similar result, providing us with flexibility regarding what function can be used in Excel.

Let’s see another example to understand the correlation between the price of Tesla Inc and the S&P 500 in the past few days. The data looks as illustrated below:

Excel Table 13

We will use the formula =CORREL(C3:C16, D3:D16) in cell G6 and =PEARSON(C3:C16, D3:D16) in cell G7, which gives the result -0.33484.

Excel Table 14

We see that the price of the S&P 500 and that of Tesla Inc have a weak negative correlation. From a general trend from 4th November 2022 to 23rd November 2022, we see that the price of the S&P 500 is increasing. However, that of Tesla Inc has been steadily falling.

Thus, the correlation graph for the dataset would be a downward straight line, as illustrated below:

S&P 500 vs Tesla Inc

Researched and authored by Akash Bagul | LinkedIn

Reviewed and edited by Parul Gupta LinkedIn

Free Resources

To continue learning and advancing your career, check out these additional helpful WSO resources: