Imputing missing data with the NIPALS method with XLSTAT
Goal of this tutorial
The goal of this tutorial is to impute missing value in a dataset using the NIPALS algorithm.
Principle of the NIPALS approach for completing missing data
The NIPALS method is a method presented by H. Wold (1973) to allow principal component analysis with missing values. The NIPALS algorithm is applied on the dataset and the obtained PCA model is used to predict the missing values.
Dataset for completing missing data
An Excel sheet with both the data and the results can be downloaded by clicking here. In this tutorial we show how to complete missing values. .
The dataset contains six variables and six observations with six missing values.
Imputing missing value
After opening XLSTAT, select the XLSTAT / Preparing data / Missing data command, or click on the corresponding button of the Preparing datamenu (see below).
Once you've clicked on the relevant button, the dialog box appears. Select the data you want to complete in the data field (in our case the table with missing values). The type of data is quantitative. Select the method, we use NIPALS.
Activate the option for observation labels and select the name of the cars.
Once you have clicked on the OK button, the results are displayed on a new sheet.
Results of the imputation process
Descriptive statistics tables are displayed (one table before and a second one after imputation).
Then the completed data is displayed.
We can see in bold the completed values. We can compare these values to the real value available in this dataset:
We can see that imputed missing values are very close to the real values. For example for the displacement of Honda Civic, the real value is 1396 and the imputed value is 1365.236. If we had used a mean imputation method, the imputed value would have been 1781.4 which is very far from the value obtained with NIPALS.
Click here for other tutorials.