Log-linear regression (Poisson regression)
Log-linear regression principles
The log-linear regression is one of the specialized cases of generalized linear models for Poisson, Gamma or Exponential-distributed data. This method is used to modeling the relationship between a scalar response variable and one or more explanatory variables. We assume that the response variable is written as the logarithm of an affine function of the explanatory variables
The log-linear regression in XLSTAT
The most common log-linear regression is the Poisson regression. This approach is usually used for modeling count data. XLSTAT also provides two other distributions: the Gamma and the exponential. Note that the exponential distribution is a Gamma distribution with a scale parameter fixed to 1.
Contrary to linear regression, an exact analytical solution does not exist. So an iterative algorithm has to be used. XLSTAT uses a Newton-Raphson algorithm. The user can change the maximum number of iterations and the convergence threshold if desired.
Results for log-linear regression in XLSTAT
- Summary of the variables selection: Where a selection method has been chosen, XLSTAT displays the selection summary. For a stepwise selection, the statistics corresponding to the different steps are displayed. Where the best model for a number of variables varying from p to q has been selected, the best model for each number or variables is displayed with the corresponding statistics and the best model for the criterion chosen is displayed in bold.
- Goodness of fit coefficients: This table displays a series of statistics for the independent model (corresponding to the case where the linear combination of explanatory variables reduces to a constant) and for the adjusted model.
- Observations: The total number of observations taken into account (sum of the weights of the observations);
- Sum of weights: The total number of observations taken into account (sum of the weights of the observations multiplied by the weights in the regression);
- DF: Degrees of freedom;
- -2 Log(Like.): The logarithm of the likelihood function associated with the model;
- R² (McFadden): Coefficient, like the R², between 0 and 1 which measures how well the model is adjusted. This coefficient is equal to 1 minus the ratio of the likelihood of the adjusted model to the likelihood of the independent model;
- R²(Cox and Snell): Coefficient, like the R², between 0 and 1 which measures how well the model is adjusted. This coefficient is equal to 1 minus the ratio of the likelihood of the adjusted model to the likelihood of the independent model raised to the power 2/Sw, where Sw is the sum of weights.
- R²(Nagelkerke): Coefficient, like the R², between 0 and 1 which measures how well the model is adjusted. This coefficient is equal to ratio of the R² of Cox and Snell, divided by 1 minus the likelihood of the independent model raised to the power 2/Sw;
- Deviance: Deviance criterion
- Pearson Chi-square
- AIC: Akaike’s Information Criterion;
- SBC: Schwarz’s Bayesian Criterion.
- Test of the null hypothesis H0: Y=constant: The H0 hypothesis corresponds to the independent model which gives the same result whatever the values of the explanatory variables. We seek to check if the adjusted model is significantly more powerful than this model. Three tests are available: the likelihood ratio test (-2 Log(Like.)), the Score test and the Wald test. The three statistics follow a Chi² distribution whose degrees of freedom are shown.
- Type III analysis: This table is only useful if there is more than one explanatory variable. Here, the adjusted model is tested against a test model where the variable in the row of the table in question has been removed. If the probability Pr > LR is less than a significance threshold which has been set (typically 0.05), then the contribution of the variable to the adjustment of the model is significant. Otherwise, it can be removed from the model.
- Model parameters: The parameter estimate, corresponding standard deviation, Wald's Chi2, the corresponding p-value and the confidence interval are displayed for the constant and each variable of the model.
- Model equation: The equation of the model is then displayed to make it easier to read, or to re-use the model.
- Predictions and residuals table: The predictions and residuals table shows, for each observation, its weight, the observed value of the dependent variable, the model's prediction, the same values divided by the weights, the standardized residuals and a confidence interval.
- Overdispersion test: For the Poisson regression an overdispersion is displayed.
This analysis is available in the XLStat-Base addin for Microsoft Excel™