How to run a canonical correlation analysis in XLSTAT
An Excel sheet with both the data and the results can be downloaded by clicking here. The data used in this tutorial are measurements done on middle-aged men in a health fitness club (Dr. A. C. Linnerud, NC State University).
There are two sets of data:
- The physiological data about the men:
- The exercises the men did :
Go to the menu ADA, and select the function Canonical Correlation Analysis.
In the General tab, determine the two datasets. Y1 corresponds to the physiological data stored in the columns B to D. Y2 corresponds to the exercise data which are in the columns E to G.
The columns have a label so leave the option Column labels ticked. You can add the Observation labelsby ticking the corresponding option and selecting the column A.
In the Options tab, verify that both datasets will be centered and reduced.
For the Outputs select them all.
Choose as well to display the unique plot available in the Charts tab.
Press OK once these selections are done.
When prompt, opt for the display of the plot with Factor 1 and Factor 2.
Notice that the explained variance is 99.22%.
Results of the canonical correlation analysis
The first result after the descriptive statistics is the correlation matrix.
Note the strong correlation between the weight and the waist (0.870) in the first table, and sit-up’s and jumps (0.669) and sit-up’s and chins (0.696) in the second table.
The correlations between both tables are rather small except Waist and sit-up’s (-0.646) and chins (-0.552).
The Eigenvalues show that the first factor alone explains 93% of the variability.
The Wilks’ lambda test which tests the hypothesis that the canonical correlations are equal to 0, should be accepted without doubt for factor 2 and 3 as the p-values are far above the 0.05 threshold. For factor 1 the p-value is 0.06, which does not provide a very strict conclusion.
The canonical correlations on factor 1 show that the two tables Y1 and Y2 are correlated. Note that this value is greater than the correlations between the two tables.
The redundancy coefficients show that a small proportion of the variability of the input variables is predicted by the canonical variables.
The correlations between input variables and canonical variables (also called Structure correlation coefficients, or Canonical factor loadings) allow understanding how the canonical variables are related to the input variables.
We can see that the variables waist and weight are correlated and correlated negatively with factor 1 and 2. They are also anti-correlated with the exercises sit-ups and chins. This means that people with a higher weight and a larger waist don’t do as many sit-up’s and chins as the other persons.
Click here for other tutorials.