How do I run a Canonical Correspondence Analysis (CCA)?
Canonical Correspondence Analysis (CCA) has been developed to allow ecologists to relate the abundance of species to environmental variables (Ter Braak, 1986). However, this method can be used in other domains. Geomarketing and demographic analyses should be able to take advantage of it.
In order to use CCA, one needs :
- A contingency table X that contains the frequencies of a series of objects (in ecology species), on several sites where they are counted,
- A table of descriptive Y variables that are measured on the same sites
- Optionally a third table Z that contains descriptive information which effect we want to remove before trying to explain the variability within X using Y. In this case, the method is called partial CCA.
The goal is to produce a map where the objects, the sites, and the variables are represented.
An Excel sheet with both the data and the results can be downloaded by clicking here. The data correspond to the counts of 10 species of insects on 12 different sites in a tropical region. A second table (displayed in red color) includes 3 quantitative variables that describe the 12 sites (altitude, humidity, and distance to the lake).
Our goal is to determine if the three descriptive variables can help us to explain the frequencies of the species of insects.
To activate the CCA dialog box, start XLSTAT, then select Multiblock data analysis / Canonical Correspondence Analysis command in the XLSTAT menu.
Once you have clicked on the button, the dialog box appears. Select the data that correspond to the sites/species data (here objects are species), and then the sites/variables data (displayed in red color on the Excel sheet). We also select the sites labels, and make sure the "Column labels" option is checked so that XLSTAT knows that we included the headers in the selection.
We activate the Options tab, and check the "Permutation test" option in order to test if the effect of the three variables on the observed frequencies of the insects is significant or not. We decide to run 1000 random permutations.
On the two images below, you can see which options have been selected for "Outputs" and "Charts".
After you have clicked on the "OK" button, the computations start and the results are displayed on a new Excel sheet.
Interpreting the results of a Canonical Correspondence Analysis
The first set of results corresponds to the descriptive statistics of the various variables. The row and column profiles of the contingency table (the frequencies of insects are displayed. The "weighted averages" correspond to the means of the variables of the second table, weighted by the marginal sums of the rows of the first table.
Next we see the results of the permutation test.
The test concludes that the sites/species data are not linearly related to the sites/variables data with 5% significance level. Looking closer, we see that the p-value is just above the threshold we had chosen (0.05 against 0.062). So the conclusion might not be as obvious. Furthermore, we are interested in checking if this is true for all variables, or if some variables seem to explain the results better than other.
The next table displays how the inertia is spread between the constrained CCA (the analysis that uses the explanatory variables) and the unconstrained CCA (the unconstrained CCA is a Correspondence Analysis of the residuals of the constrained CCA).
The next table displays how the inertia is spread between the constrained CCA (the analysis that uses the explanatory variables) and the unconstrained CCA (the unconstrained CCA is a Correspondence Analysis of the residuals of the constrained CCA). We see here that to the constrained CCA corresponds only 40% of the inertia. So a look at the results of the unconstrained CCA would make sense, and the relation between the sites and the species should not be analyzed too much in depth here. However, to make the tutorial shorter, we will focus here only on the constrained CCA results (named simply CCA results on the report).
Within the CCA eigenanalysis, we see that most of the inertia is carried by the first axis. With the second axis we obtain 92.5% of the inertia. This means that the two-dimensional CCA map will be enough to analyze the relationships between the sites, the species and the variables.
The CCA map (see below) allows to simultaneously visualize the objects (here the insect species), the sites, and the variables.
We see on the map that for the species Insect4 and Insect5 the frequency is associated with a high humidity and a low Altitude. Insect7 seems to be more sensitive to the distance to the lake. Insect9 seems to prefer a higher altitude, or more likely a lower humidity.
Note: if you want to change the "Objects" to "Species" on the CCA map, all you need to do is click on one of the points of the corresponding, and then change Objects to species in the Excel formula bar.
Click here for other tutorials.