What is multicolinearity?
Variables are said to be multicolinear if there is a linear relationship between them. This is an extension of the simple case of colinearity between two variables. For example, for three variables X1, X2 and X3, we say that they are multicolinear if we can write:
X1 = aX2 + bX3
where a and b are real numbers.
How to detect multicolinearity
To detect the multicolinearities and identify the variables involved, linear regressions must be carried out on each of the variables as a function of the others. We then calculate:
- The R² of each of the models
If the R² is 1, then there is a linear relationship between the dependent variable of the model (the Y) and the explanatory variables (the Xs).
- The tolerance for each of the models.
The tolerance is (1-R²). It is used in several methods (linear regression, logistic regression, discriminant factorial analysis) as a criterion for filtering variables. If a variable has a tolerance less than a fixed threshold (the tolerance is calculated by taking into account variables already used in the model), it is not allowed to enter the model as its contribution is negligible and it risks causing numerical problems.
- The VIF (Variance Inflation Factor)
The VIF is equal to the inverse of the tolerance.
Use of multicolinearity statistics
Detecting multicolinearities within a group of variables can be useful especially in the following cases:
- To identify structures within the data and take operational decisions (for example, stop the measurement of a variable on a production line as it is strongly linked to others which are already being measured),
- To avoid numerical problems during certain calculations. Certain methods use matrix inversions.
This analysis is available in the XLStat-Base addin for Microsoft Excel™