XLSTAT - Principal Component Analysis (PCA)
Principles of Principal Component Analysis
Principal Component Analysis (PCA) is one of the most frequently used multivariate data analysis.
Principal Component Analysis can be considered as a projection method which projects observations from a p-dimensional space with p variables to a k-dimensional space (where k < p) so as to conserve the maximum amount of information (information is measured here through the total variance of the scatter plots) from the initial dimensions. If the information associated with the first 2 or 3 axes represents a sufficient percentage of the total variability of the scatter plot, the observations will be able to be represented on a 2- 3-dimensional chart, thus making interpretation much easier.
Use of Principal Component Analysis
There are several uses for Principal Component Analysis, including:
- The study and visualization of the correlations between variables to hopefully be able to limit the number of variables to be measured afterwards;
- Obtaining non-correlated factors which are linear combinations of the initial variables so as to use these factors in modeling methods such as linear regression, logistic regression or discriminant analysis.
- Visualizing observations in a 2- or 3-dimensional space in order to identify uniform or atypical groups of observations.
Principal Component Analysis input data
XLSTAT offers several possibilities for the matrix to be used in the Principal Component Analysis algotithm:
- Pearson (n) and (n-1)
- Covariance (n) and (n-1)
Rotation for Principal Component Analysis
Rotations can be applied on the factors. Several methods are available including Varimax, Quartimax, Equamax, Parsimax, Quartimin and Oblimin and Promax.
Results for Principal Component Analysis in XLSTAT
This table shows the data to be used afterwards in the calculations. The type of correlation depends on the option chosen in the "General" tab in the dialog box. For correlations, significant correlations are displayed in bold.
Bartlett's sphericity test
The results of the Bartlett sphericity test are displayed. They are used to confirm or reject the hypothesis according to which the variables do not have significant correlation.
The eigenvalues and corresponding chart (scree plot) are displayed. The number of eigenvalues is equal to the number of non-null eigenvalues.
Factor loadings and correlations
XLSTAT displays the factor loadings in the new space, then the correlations between the initial variables and the components in the new space. The correlations are equal to the factor loadings in a normalized PCA (on the correlation matrix).
If supplementary variables have been selected, the corresponding coordinates and correlations are displayed at the end of the table.
Contributions are an interpretation aid. The variables which had the highest influence in building the axes are those whose contributions are highest.
Squared cosines for the variables
As in other factor methods, squared cosine analysis is used to avoid interpretation errors due to projection effects. If the squared cosines associated with the axes used on a chart are low, the position of the observation or the variable in question should not be interpreted.
The factor scores in the new space are then displayed. If supplementary data have been selected, these are displayed at the end of the table.
The contribution table shows the contributions of the observations in building the principal components.
Squared cosines for the obsrevations
The squared cosines table displays the squared cosines between the observation vectors and the factor axes.
Results with rotations
Where a rotation has been requested, the results of the rotation are displayed with the rotation matrix first applied to the factor loadings. This is followed by the modified variability percentages associated with each of the axes involved in the rotation. The coordinates, contributions and cosines of the variables and observations after rotation are displayed in the following tables.
XLSTAT charts for Principal Component Analysis
These charts show the correlations between the components and initial variables.
You can also display the initial variables in the form of vectors.
The obsrevations charts represent the observations in the new space.
The biplots represent the observations and variables simultaneously in the new space.
Here as welle the initial variables can be plotted in the form of vectors.
There are different types of biplots:
- Correlation biplot
- Distance biplot
- Symmetric biplot
- Coefficient: Choose the coefficient whose square root is to be multiplied by the coordinates of the variables. This coefficient lets you to adjust the position of the variable points in the biplot in order to make it more readable. If set to other than 1, the length of the variable vectors can no longer be interpreted as standard deviation (correlation biplot) or contribution (distance biplot).
This analysis is available in the XLStat-Basic addin for Microsoft Excel™