How can I use XLSTAT to fit a distribution to a sample of data?

An Excel sheet with both the data and the results can be downloaded by clicking here. The data correspond to the residuals obtained in the tutorial on ANCOVA.

Our goal is to test if the assumption of normality of the residuals is valid or not. By construction theses residuals are centered (mean = 0) and reduced (variance = 1). We will check with the Distribution Fitting tool if the residuals are distributed as a N(0, 1) or not.

After opening XLSTAT, select the XLSTAT/Modeling data/Distribution Fitting command, or click on the corresponding button of the "Modeling Data" toolbar (see below).

Once you've clicked on the button, the dialog box appears. Select the data on the Excel sheet. The "Data" are in the B column. We let XLSTAT "estimate" the parameters of the normal distribution, but we could also fix them to 0 and 1, as we already know these characteristics of the data. We activate the options for the Kolmogorov-Smirnov and the Goodness of Chi-square tests, which are necessary to test our assumption. For the Chi-square test, we decide to do it on 10 classes of constant amplitude.

The computations begin once you have clicked on the "OK" button. The results will then be displayed. The first results table displays four descriptive statistics of the normal distribution function, first as estimated on the sample, and then computed on the basis of the theory. If the mean and the variance are the same (which is always the case with the normal distribution), we notice a difference for the Skewness and the Kurtosis.

The Kolmogorov Smirnov test allows to test if the biggest difference between the empirical and theoretical cumulative distribution functions are above a critical value or not. This test is known as being better suited than the Chi-square test for the continuous distribution functions, which is the case with the normal distribution. The results make us conclude that with a significance level of 0.05, we cannot reject the hypothesis that the residuals come from a N(0, 1) normal distribution function.

The Chi-square goodness of fit test allows to test if the Chi-square distance between the empirical and theoretical distribution functions is above a critical value or not. In our case, we can see that the observed value is above the critical value. This means that the conclusion is opposite to what the previous test indicated. But we can also see that if we remove the class (interval) that contributes the most to the Chi-square, the critical and observed values are very close.

Looking at the table and at the chart below, we can see the that the interval that contributes the most to the Chi-square, is the one that corresponds to the extreme values that we identified during the analysis of the residuals of the ANCOVA. This indicates, that simply by removing the corresponding observations, we would improve the normality of the residuals.

As a conclusion, it seems that we can accept the normality assumption of the residuals (Kolmogorov-Smirnov), and that the high residuals absolute values are responsible for a rejection by the Chi-square test. Removing some of the corresponding observations (after, if possible, an in depth analysis of the observations), would allow to obtain a better fitted model, that corresponds better to the initial assumption of normality.

Kovach Computing Services (KCS) was founded in 1993 by Dr. Warren Kovach. The company specializes in the development and marketing of inexpensive and easy-to-use statistical software for scientists, as well as in data analysis consulting.

Get in Touch

• Email:
sales@kovcomp.com
85 Nant y Felin
Pentraeth, Isle of Anglesey
LL75 8UY
United Kingdom
• Phone:
(UK): 01248-450414
(Intl.): +44-1248-450414