How do I use differencing to obtain a stationary time series?

An Excel sheet with both the data and results can be downloaded by clicking here. The data have been obtained in [Box, G.E.P. and Jenkins, G.M. (1976). Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco], and correspond to monthly international airline passengers (in thousands) from January 1949 to December 1960. It is widely used as an nonstationary seasonal time series. Our goal is to show how helpful descriptive analysis can be before a modeling approach.

We notice that on the chart, there is global upward trend, that every year, a similar cycles start while the variability within a year seems to increase over time. In order to confirm this trend we are going to analyse the autocorrelation function of the series.

After opening XLSTAT, select the XLSTAT/XLSTAT-Time/Descriptive analysis command, or click on the corresponding button of the "XLSTAT-Time" toolbar (see below).

Once you've clicked on the button, the Descriptive analysis dialog appears. Select the data on the Excel sheet. The "Variable to analyze" corresponds to the series of interest, the Passengers. After you selected the data, select the Holt-Winters method, and then the "seasonal multiplicative" sub-method. Then, so that the model parameters are optimized (ordinary least square), check the "optimized" option. The period of the series si set to 12, because it seems the cycles are repeated every year (12 months). Last, in the validation box, enter 12 so that the last 12 values are not used to fit the model, but only to validate the model. The option "Column labels" is activated because the first row of the selected data contains the header of the variable.

The computations begin once you have clicked on "OK". The results will then be displayed. The first table displays the summary statistics. Then the "Normality test and white noise tests" table is displayed. The Jarque-Bera test is a normality test, based on the skewness and kurtosis coefficients. The bigger the value of the Chi-square statistic, the more unlikely the null hypothesis that the data are normally distributed. Here the p-value, which corresponds to the probability of being wrong when rejecting the null hypothesis, is close to 0.01. With an alpha=0.05 significance level, one should reject the null hypothesis.

The three other three tests (Box-Pierce, Ljung-Box, McLeod-Li) are computed at different time lags. They allow to test if the data could be assumed to be a white noise or not. These tests are also based on the Chi-square distribution. They all agree that the data cannot be assumed to be generated by a white noise process. While the sorting of the data has no influence on the Jarque-Bera test, it does have an influence with the three other tests which are particularly suited for time series analysis.

Below the table that displays the descriptive functions of the time series, two bar charts display the evolution of the autocorrelation function (ACF) and of the partial autocorrelation function (PACF). The 95% confidence intervals are also displayed. From the autocorrelogram, we can identify a clear lag 1 autocorrelation, as well as a seasonnality which seems to be of 12 months.

In order to improve the normality of the data, we want to perform two transformations:
- First, we want to stabilize the increasing variability of the series,
- Second, we want to remove the autocorrelations by differencing the series.

This can be done using the Time series transformation tool. To activate the corresponding dialog box, select the XLSTAT/XLSTAT-Time/Transforming series command, or click on the corresponding button of the "XLSTAT-Time" toolbar (see below).

Once you've clicked on the button, the dialog appears. Select the data on the Excel sheet. The "Variable to analyze" corresponds to the series of interest, the Passengers. After you selected the data, select the Box-Cox option. While we could ask for an optimized transformation (the lambda parameter of the Box-Cox transformation would be adjusted so that the likelihood of a regression model - tranformed Y = simple linear function of time - would be as high as possible), we decide here to fix the lambda value to 0, which corresponds to a log transformation of the series. The log transformatioin is often a good choice for removing increasing variability. Then, in order to remove the trend and the seasonnal component, we decide to use the differencing method. We set the d value to one the remove the trend, and D and s to 1 and 12 to remove the 12 months seasonal component.

The computations begin once you have clicked on "OK". We first see a table and a chart that correspond to the Box-Cox transformation. We can see the transformed series on the chart below. It looks like the log transformation has removed the increasing variability.

Next, a table and a chart display the differencing transformation. We see that the differencing has well removed the trend, but it is not clear if we have obtained a white noise or not.

In order to verify if the transformations have made that the series looks now like a white noise and is normality distributed, we need to perform a descriptive analysis on the transformed series.

The Jarque-Bera test confirms that the series looks more like a normal sample (we jumped from 0.01 to 0.04). But looking at the white noise tests it looks like the transformations have not been effiicient enough. The autocorrelogram indicates that we removed too much of the lag 1 and lag 12 components, as they have now negative autocorrelation coefficients. Furthermore the lag 3 and 9 cofficients seem to be also significant. Therefore, it seems further work is necessary to understand the underlying phenomenon.

Copyright © 2009 Kovach Computing Services, Anglesey, Wales. All Rights Reserved. Portions copyright Addinsoft, Provalis Research, and Data Description Inc.

Last modified 3 December, 2009