Tech-tonic Moves

Tech-tonic moves is a unique series where we offer tricks and tips from the technologically advanced and those who pride in their digital fluency and software skills. In this first piece, our inaugural guest shares his everyday passion for doing statistics on Excel and how you can too.

Using Microsoft Excel’s Data Analysis ToolPak to Perform Correlational Analysis

By Eric Cai

Microsoft Excel is a useful point-and-click software for data entry, data analysis, and data visualization.  It has a little-known extension called the Data Analysis ToolPak, which allows you to conduct many types of statistical analysis, such as regression, t-tests, ANOVA, and random-number generation.  

This article will: 

A. introduce the concept of correlation

B. show you how to use Excel’s Data Analysis ToolPak to perform correlational analysis.

Before proceeding in this tutorial, make sure to install the Data Analysis ToolPak in Microsoft Excel. Microsoft’s website provides step-by-step instructions for you to learn how to install it.

Example data based on groceries

Suppose that a grocery store sells 5 products: 

  1. Hot dogs
  2. Pasta
  3. Chicken 
  4. Chips
  5. Ketchup

The store records transactional data about the expenditures of each product in every visit by a customer. 

In the table below are the first 5 rows of the data; the entire dataset is available in the downloadable attachment below that:

VisitHot DogsPastaChickenChipsKetchup
1$          9.83 $   3.56 $       6.75 $   4.29 $        5.31 
2$       15.69 $   4.41 $     10.01 $ 10.09 $        7.09 
3$       23.50 $   6.36 $     13.59 $ 19.97 $      14.34 
4$          7.28 $   7.29 $     15.67 $   2.24 $        4.51 
5$       20.15 $   7.90 $     15.96 $ 13.86 $        9.20 
These are just 5 rows. Check the downloadable table below.

(Also note, the above data are simulated from a random-number generator.  The correlations are unusually high in the above data for illustrative purposes only; they do not represent actual business data from realistic retail purchases.)

The grocery store’s owner wants to know which products tend to sell well together.  What can you do to answer this question?  Correlational analysis can help in this situation.

What is a correlation?

The correlation coefficient is a measure of the strength of association between two variables.  It can range from -1 to +1, and the sign indicates whether the association is in the same direction or in the opposite direction.  

  • An example of a positive correlation is the association between height and weight.  Taller people tend to be heavier; thus, as height increases, weight tends to increase.
  • An example of a negative correlation is the association between altitude and temperature.  As you climb up a mountain, it tends to get colder; thus, as altitude increase, temperature tends to decrease.
  • If the correlation is zero, then there is no association between 2 variables.  An example of this is how much time you spend reading and the temperature on the planet Mars.  These 2 variables are completely separate and have no pattern of increasing or decreasing together.

The magnitude of the correlation coefficient denotes the strength of the relationship; in mathematical terms, you can get the magnitude by taking the absolute value of the correlation.  A bigger magnitude denotes a stronger relationship.  A correlation of -1 or +1 denotes a perfect correlation.

Using Excel’s Data Analysis ToolPak to Calculate Correlation Coefficients

Returning to our example of the 5 products in the grocery store, how can we use correlation to determine which products sell well together?  You can obtain the correlation coefficient for the sales between each pair of products.  A correlation matrix is a handy way of organizing pairwise correlation coefficients, and the Data Analysis ToolPak can provide this matrix for you.  Here are the steps to do so.

  1. Click on the “Data” tab in the top menu.
  1. On the far right, click on the “Data Analysis” icon; that is the Data Analysis ToolPak.  The entire menu should look like this.

3. In this menu of analyses, choose “Correlation”, and click on the “OK” button.

You should get this dialog box.  

4. For the “Input Range”, highlight the 5 columns of data.

5. Make sure that the data are grouped by columns, which should be the default option.

6. Click on “Labels in First Row”.

7. The “Output options” section allows you to choose where to put the results of the correlation matrix.  I suggest putting it two columns away from your data on the same sheet.

8. Once you finish entering all inputs, click on the “OK” button.

You should get this correlation matrix; the large correlation coefficients are highlighted in bold.

 Hot DogsPastaChickenChipsKetchup
Hot Dogs1
Pasta0.0142851
Chicken0.011690.9884341
Chips0.968361-0.00154-0.006191
Ketchup0.788359-0.04517-0.035930.7616651

This analysis shows that the following pairs of products sell well together:

  • Chicken and pasta
  • Chips and hot dogs
  • Ketchup and hot dogs
  • Ketchup and chips

Given these results, it makes sense to 

  • Place chicken and pasta near each other to boost the sales of both products
  • Place chips, ketchup, and hot dogs near each other to boost the sales of these 3 products 

Concluding thoughts

Correlation is a statistical concept for exploring associations between variables.  Microsoft Excel has a tool called the Data Analysis ToolPak, which enables you to perform many statistical analyses, including the calculation of a correlation matrix for pairwise correlations between many variables.  Try it for your own problems in business analytics!

Eric Cai works as a Senior Data Scientist at Acosta.  He earned a Master of Science degree in Statistics from the University of Toronto.

Leave a Reply