Ordinal logit model
Ordinal Logit model principles
The ordinal logit model is a frequently-used method as it enables to ordinal variables to be modeled. It is frequently used in survey analysis (whether a respondent is not satisfied, satisfied or very satisfied). It has the same principles as the binary and multinomial logit models.
The principle of the ordinal logit model is to link the cumulative probability of a level to explanatory variables.
Models for ordinal logit model
Logistic and linear regression belong to the same family of models called GLM (Generalized Linear Models): in both cases, an event is linked to a linear combination of explanatory variables.
The most common functions used to link probability p to the explanatory variables are the logit function (we refer to the ordinal Logit model) and the standard normal distribution function (the ordinal Probit model).
The analytical expression of the models is as follows:
- Logit: p = exp(βX) / (1 + exp(βX))
- Probit: p = 1/√2π ∫-∞...βX exp(-x²/2)∂x
The knowledge of the distribution of the event being studied gives the likelihood of the sample. To estimate the β parameters of the model (the coefficients of the linear function), we try to maximize the likelihood function.
Contrary to linear regression, an exact analytical solution does not exist. So an iterative algorithm has to be used. XLSTAT uses a Newton-Raphson algorithm. The user can change the maximum number of iterations and the convergence threshold if desired.
In the ordinal logit model we model the cumulative probability of responding to a level smaller or equal to j. We have the probability P(y<=j) for j from 1 to the number of categories of Y.
XLSTAT results for Ordinal Logit model
XLSTAT can display the classification table (also called the confusion matrix) used to calculate the percentage of well-classified observations.
Results for logistic regression in XLSTAT
- Summary of the variables selection: Where a selection method has been chosen, XLSTAT displays the selection summary. For a stepwise selection, the statistics corresponding to the different steps are displayed. Where the best model for a number of variables varying from p to q has been selected, the best model for each number or variables is displayed with the corresponding statistics and the best model for the criterion chosen is displayed in bold.
- Goodness of fit coefficients: This table displays a series of statistics for the independent model (corresponding to the case where the linear combination of explanatory variables reduces to a constant) and for the adjusted model.
- Observations: The total number of observations taken into account (sum of the weights of the observations);
- Sum of weights: The total number of observations taken into account (sum of the weights of the observations multiplied by the weights in the regression);
- DF: Degrees of freedom;
- -2 Log(Like.): The logarithm of the likelihood function associated with the model;
- R² (McFadden): Coefficient, like the R², between 0 and 1 which measures how well the model is adjusted. This coefficient is equal to 1 minus the ratio of the likelihood of the adjusted model to the likelihood of the independent model;
- R²(Cox and Snell): Coefficient, like the R², between 0 and 1 which measures how well the model is adjusted. This coefficient is equal to 1 minus the ratio of the likelihood of the adjusted model to the likelihood of the independent model raised to the power 2/Sw, where Sw is the sum of weights.
- R²(Nagelkerke): Coefficient, like the R², between 0 and 1 which measures how well the model is adjusted. This coefficient is equal to ratio of the R² of Cox and Snell, divided by 1 minus the likelihood of the independent model raised to the power 2/Sw;
- AIC: Akaike’s Information Criterion;
- SBC: Schwarz’s Bayesian Criterion.
- Test of the null hypothesis H0: Y=p0: The H0 hypothesis corresponds to the independent model which gives probability p0 whatever the values of the explanatory variables. We seek to check if the adjusted model is significantly more powerful than this model. Three tests are available: the likelihood ratio test (-2 Log(Like.)), the Score test and the Wald test. The three statistics follow a Chi² distribution whose degrees of freedom are shown.
- Type III analysis: This table is only useful if there is more than one explanatory variable. Here, the adjusted model is tested against a test model where the variable in the row of the table in question has been removed. If the probability Pr > Wald is less than a significance threshold which has been set (typically 0.05), then the contribution of the variable to the adjustment of the model is significant. Otherwise, it can be removed from the model.
- Model parameters: In ordinal logit model, there is one set of parameters and one intercept for each category of the dependent variable.
- Classification table: Activate this option to display the table showing the percentage of well-classified observations for both categories. If a validation sample has been extracted, this table is also displayed for the validation data.
- Comparison of the categories of the qualitative variables: If one or more explanatory qualitative variables have been selected, the results of the equality tests for the parameters taken in pairs from the different qualitative variable categories are displayed.
This analysis is available in the XLStat-Basic addin for Microsoft Excel™