XLSTAT - Principal Component Analysis (PCA)

Tutorial
View a tutorial

Principles of Principal Component Analysis

Principal Component Analysis (PCA) is one of the most frequently used multivariate data analysis.

Principal Component Analysis can be considered as a projection method which projects observations from a p-dimensional space with p variables to a k-dimensional space (where k < p) so as to conserve the maximum amount of information (information is measured here through the total variance of the scatter plots) from the initial dimensions. If the information associated with the first 2 or 3 axes represents a sufficient percentage of the total variability of the scatter plot, the observations will be able to be represented on a 2- 3-dimensional chart, thus making interpretation much easier.

Use of Principal Component Analysis

There are several uses for Principal Component Analysis, including:

  • The study and visualization of the correlations between variables to hopefully be able to limit the number of variables to be measured afterwards;
  • Obtaining non-correlated factors which are linear combinations of the initial variables so as to use these factors in modeling methods such as linear regression, logistic regression or discriminant analysis.
  • Visualizing observations in a 2- or 3-dimensional space in order to identify uniform or atypical groups of observations.

Principal Component Analysis input data

XLSTAT offers several possibilities for the matrix to be used in the Principal Component Analysis algotithm:

  • Pearson (n) and (n-1)
  • Covariance (n) and (n-1)
  • Spearman
  • Kendall
  • Polychoric

Rotation for Principal Component Analysis

Rotations can be applied on the factors. Several methods are available including Varimax, Quartimax, Equamax, Parsimax, Quartimin and Oblimin and Promax.

Results for Principal Component Analysis in XLSTAT

Correlation/Covariance matrix

This table shows the data to be used afterwards in the calculations. The type of correlation depends on the option chosen in the "General" tab in the dialog box. For correlations, significant correlations are displayed in bold.

Bartlett's sphericity test

The results of the Bartlett sphericity test are displayed. They are used to confirm or reject the hypothesis according to which the variables do not have significant correlation.

Eigenvalues

The eigenvalues and corresponding chart (scree plot) are displayed. The number of eigenvalues is equal to the number of non-null eigenvalues.

Factor loadings and correlations

XLSTAT displays the factor loadings in the new space, then the correlations between the initial variables and the components in the new space. The correlations are equal to the factor loadings in a normalized PCA (on the correlation matrix).
If supplementary variables have been selected, the corresponding coordinates and correlations are displayed at the end of the table.

Contributions

Contributions are an interpretation aid. The variables which had the highest influence in building the axes are those whose contributions are highest.

Squared cosines for the variables

As in other factor methods, squared cosine analysis is used to avoid interpretation errors due to projection effects. If the squared cosines associated with the axes used on a chart are low, the position of the observation or the variable in question should not be interpreted.

Factor scores

The factor scores in the new space are then displayed. If supplementary data have been selected, these are displayed at the end of the table.

Contributions

The contribution table shows the contributions of the observations in building the principal components.

Squared cosines for the obsrevations

The squared cosines table displays the squared cosines between the observation vectors and the factor axes.

Results with rotations

Where a rotation has been requested, the results of the rotation are displayed with the rotation matrix first applied to the factor loadings. This is followed by the modified variability percentages associated with each of the axes involved in the rotation. The coordinates, contributions and cosines of the variables and observations after rotation are displayed in the following tables.

XLSTAT charts for Principal Component Analysis

Correlations charts

These charts show the correlations between the components and initial variables.
You can also display the initial variables in the form of vectors.

Observations charts:

The obsrevations charts represent the observations in the new space.

Biplots

The biplots represent the observations and variables simultaneously in the new space.
Here as welle the initial variables can be plotted in the form of vectors.
There are different types of biplots:

  • Correlation biplot
  • Distance biplot
  • Symmetric biplot
  • Coefficient: Choose the coefficient whose square root is to be multiplied by the coordinates of the variables. This coefficient lets you to adjust the position of the variable points in the biplot in order to make it more readable. If set to other than 1, the length of the variable vectors can no longer be interpreted as standard deviation (correlation biplot) or contribution (distance biplot).

XLSTAT

This analysis is available in the XLStat-Base addin for Microsoft Excel

About KCS

Kovach Computing Services (KCS) was founded in 1993 by Dr. Warren Kovach. The company specializes in the development and marketing of inexpensive and easy-to-use statistical software for scientists, as well as in data analysis consulting.

Mailing list Join our mailing list

Home | Order | MVSP | Oriana | XLStat
QDA Miner | Accent Composer | Stats Books
Stats Links | Anglesey

Share: FacebookFacebook TwitterTwitter RedditReddit
Del.icio.usDel.icio.us Stumble UponStumble Upon

 

Like us on Facebook

Get in Touch

  • Email:
    sales@kovcomp.com
  • Address:
    85 Nant y Felin
    Pentraeth, Isle of Anglesey
    LL75 8UY
    United Kingdom
  • Phone:
    (UK): 01248-450414
    (Intl.): +44-1248-450414