XLSTAT-Latent Class

Tutorial
View tutorial
s
XLSTAT-Latent Class is a powerful tool that uses Latent Classes. It is based on two modules from Latent GOLD® 5.0: Latent Class Cluster models and Latent Class Regression models. Both model families offer unique features compared to traditional clustering or regression approaches. XLSTAT-Latent Class offers a wide variety of easily implementable options that allow the user to gain full control over the Latent Class models.

Features

Note on XLSTAT-Latent Class: This module runs under all Windows versions of Excel, but not on the Mac.

Maximum size of datasets:

65500 rows and 250 columns. Variables can be in rows or in columns.

Available languages:

English, German, Spanish, Italian, Portuguese, Japanese, French.

Demo version

A trial version of XLSTAT-LG is included in the main XLSTAT download.

Prices and ordering

For prices, on-line ordering and other purchasing information please go to our ordering page.

 

 

 

DETAILED DESCRIPTIONS

Latent Class cluster models

What is Latent Class Analysis?

Latent class analysis involves the construction of Latent Classes which are unobserved (latent) subgroups or segments of cases. The latent classes are constructed based on the observed (manifest) responses of the cases on a set of indicator variables. Cases within the same latent class are homogeneous with respect to their responses on these indicators, while cases in different latent classes differ in their response patterns. Formally, latent classes are represented by K distinct categories of a nominal latent variable X.. Since the latent variable is categorical, Latent Class modeling differs from more traditional latent variable approaches such as factor analysis, structural equation models, and random-effects regression models since these approaches are based on continuous latent variables. 

XLSTAT-LG is based on the Latent Gold® software developed by Statistical Innovations inc.

What is a Latent Class cluster model?

Latent Class cluster model:

  • Includes a nominal latent variable X with K categories, each category representing a cluster.
  • Each cluster contains a homogeneous group of persons (cases) who share common interests, values, characteristics, and/or behavior (i.e., share common model parameters).
  • These interest, values, characteristics, and/or behavior constitute the observed variables (indicators) Y upon which the latent clusters are derived.

XLSTAT-LG allows lauching computations automatically on different models according to different number of classes. It is also possible to optimize Bayes constants, sets of random starting values, as well  iteration parameters for both the Expectation-Maximization and Newton-Raphson algorithms, which are used for model estimation.

Advantages of Latent Class cluster models over more traditional clustering methods

Advantages of Latent Class cluster models over more traditional ad-hoc types of cluster analysis methods include model selection criteria and probability-based classification. Posterior membership probabilities are estimated directly from the model parameters and used to assign cases to the modal class - the class for which the posterior probability is highest.

Furthermore, it is possible to include variables of different scales (continuous, ordinal or nominal) within the same model. These variables are called indicators.

A special feature of Latent Class cluster models is the ability to obtain an equation for calculating these posterior membership probabilities directly from the observed variables (indicators). This equation is called the scoring equation. It can be used to score new cases based on a LC cluster model estimated previously. That is, the equation can be used to classify new cases into their most likely latent class as a function of the observed variables. This feature is unique to LC models – it is not available with any other clustering technique.

Results

XLSTAT-LG provides one section per model (each model being represented by a specific number of classes):

Model Summary Statistics: Number of cases used in model estimation, number of distinct parameters estimated, seed and best seed that can reproduce the current model more quickly using the number of starting sets =0. 

Estimation Summary: for each of the Expectation-Maximization and Newton-Raphson algorithms, XLSTAT reports the number of iterations used, the log-posterior value, the likelihood-ratio goodness-of-fit value, as well as the final convergence value.

Chi-Square Statistics:

  • Likelihood-ratio goodness-of-fit value (L²) for the current model and the associated bootstrap p-value.
  • X2 and Cressie-Read. These are alternatives to L2 that should yield a similar p-value according to large sample theory if the model specified is valid and the data is not sparse.
  • BIC, AIC, AIC3 and CAIC and SABIC (based on L²). These statistics (information criteria) weight fit and parsimony by adjusting the LL to account for the number of parameters in the model. The lower the value, the better the model.
  • Dissimilarity index: A descriptive measure indicating how much the observed and estimated cell frequencies differ from one another. It indicates the proportion of the sample that needs to be movedto another cell to get a perfect fit.

Log-likelihood Statistics: 

  • log-likelihood, log-prior (associated to Bayes constants) as well as the log-posterior.
  • BIC, AIC, AIC3, CAIC and SABIC (based on LL).  these statistics (information criteria) weight fit and parsimony by adjusting the LL to account for the number of parameters in the model. The lower the value, the better the model.

Classification Statistics: 

  • Classification errors (based on modal assignment).
  • Reduction of errors (Lambda), entropy R², standard R². These pseudo R-squared statistics indicate how well one can predict class memberships based on the observed variables (indicators and covariates). The closer these values are to 1 the better the predictions.
  • Classification log-likelihood under the assumption that the true class membership is known.
  • AWE (similar to BIC, but also takes into account classification performance).
  • Entropy.
  • CLC.

Classification Table:

  • Modal table: Cross-tabulates modal class assignments.
  • Proportional table: Cross-tabulates probabilistic class assignments.

Profile table, which includes:

  • Number of clusters
  • Indiatorsc: The body of the table contains (marginal) conditional probabilities that show how the clusters are related to the Nominal or Ordinal indicator variables. These probabilities sum to 1. For indicators specified as Continuous, the body of the table contains means instead of probabilities. For indicators specified as Ordinal, means are displayed in addition to the conditional probabilities within each cluster (column).
  • Standard errors for the (marginal) conditional probabilities.

Probabilities and means that appear in the Profile Output, are displayed graphically in a Profile Plot.

Frequencies / Residuals:

Table of observed vs. estimated expected frequencies (and residuals). Note: Residuals having magnitude greater than 2 are statistically significant. This output is not reported in the case of 1 or more continuous indicators.

Bivariate Residuals: a table containing the bivariate residuals (BVRs) for a model. Large BVRs suggest violation of the local independence assumption.

Scoring equation: regression coefficients associated with the multinomial logit model.

Classification: Outputs for each observation the posterior class memberships and the modal assignment based on the current model. 

References

Vermunt, J.K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18, 450-469. Link: http://members.home.nl/jeroenvermunt/lca_three_step.pdf

Vermunt, J.K., and Magidson, J. (2005). Latent GOLD 4.0 User's Guide. Belmont, MA: Statistical Innovations Inc.  http://www.statisticalinnovations.com/technicalsupport/LGusersguide.pdf

Vermunt, J.K., and Magidson, J. (2013). Technical Guide for Latent GOLD 5.0: Basic, Advanced, and Syntax. Belmont, MA: Statistical Innovations Inc.  http://www.statisticalinnovations.com/technicalsupport/LGtechnical.pdf

Vermunt, J.K., and Magidson, J. (2013). Latent GOLD 5.0 Upgrade Manual. Belmont, MA: Statistical Innovations Inc.  
http://statisticalinnovations.com/technicalsupport/LG5manual.pdf

 

Latent Class regression models

What is Latent Class Analysis?

Latent class analysis (LCA) involves the construction of Latent Classes which are unobserved (latent) subgroups or segments of cases. The latent classes are constructed based on the observed (manifest) responses of the cases on a set of indicator variables. Cases within the same latent class are homogeneous with respect to their responses on these indicators, while cases in different latent classes differ in their response patterns. Formally, latent classes are represented by K distinct categories of a nominal latent variable X.. Since the latent variable is categorical, Latent Class modeling differs from more traditional latent variable approaches such as factor analysis, structural equation models, and random-effects regression models since these approaches are based on continuous latent variables. 

XLSTAT-LG is based on the Latent Gold® software developed by Statistical Innovations.

What is a Latent Class regression model?

Latent Class regression model:

  • Is used to predict a dependent variable as a function of predictor variables (Regression model).
  • Includes a K-category latent variable X to cluster cases (LC model)
  • Each category represents a homogeneous subpopulation (segment) having identical regression coefficients (LC Regression Model).
  • Each case may contain multiple records (Regression with repeated measurements).
  • The appropriate model is estimated according to the scale type of the dependent variable:
    1. Continuous: Linear regression model (with normally distributed residuals).
    2. Nominal (with more than 2 levels): Multinomial logistic regression.
    3. Ordinal (with more than 2 ordered levels): Adjacent-category ordinal logistic regression model.
    4. Count: Log-linear Poisson regression.
    5. Binomial Count: Binomial logistic regression model.

XLSTAT-LG allows lauching computations automatically on different models according to different number of classes. It is also possible to optimize Bayes constants, sets of random starting values, as well  iteration parameters for both the Expectation-Maximization and Newton-Raphson algorithms, which are used for model estimation.

Results

XLSTAT-LG provides one section per model (each model being represented by a specific number of classes):

Model Summary Statistics: Number of cases used in model estimation, number of distinct parameters estimated, seed and best seed that can reproduce the current model more quickly using the number of starting sets =0. 

Estimation Summary: for each of the Expectation-Maximization and Newton-Raphson algorithms, XLSTAT reports the number of iterations used, the log-posterior value, the likelihood-ratio goodness-of-fit value, as well as the final convergence value.

Chi-Square Statistics:

  • Likelihood-ratio goodness-of-fit value (L²) for the current model and the associated bootstrap p-value.
  • X2 and Cressie-Read. These are alternatives to L2 that should yield a similar p-value according to large sample theory if the model specified is valid and the data is not sparse.
  • BIC, AIC, AIC3 and CAIC and SABIC (based on L²). These statistics (information criteria) weight fit and parsimony by adjusting the LL to account for the number of parameters in the model. The lower the value, the better the model.
  • Dissimilarity index: A descriptive measure indicating how much the observed and estimated cell frequencies differ from one another. It indicates the proportion of the sample that needs to be movedto another cell to get a perfect fit.

Log-likelihood Statistics: 

  • log-likelihood (LL), log-prior (associated to Bayes constants) as well as the log-posterior.
  • BIC, AIC, AIC3, CAIC and SABIC (based on LL).  these statistics (information criteria) weight fit and parsimony by adjusting the LL to account for the number of parameters in the model. The lower the value, the better the model.

Classification Statistics: 

  • Classification errors (based on modal assignment).
  • Reduction of errors (Lambda), entropy R², standard R². These pseudo R-squared statistics indicate how well one can predict class memberships based on the observed variables (indicators and covariates). The closer these values are to 1 the better the predictions.
  • Classification log-likelihood under the assumption that the true class membership is known.
  • AWE (similar to BIC, but also takes into account classification performance).
  • Entropy.
  • CLC.

Classification Table:

  • Modal table: Cross-tabulates modal class assignments.
  • Proportional table: Cross-tabulates probabilistic class assignments.

Prediction statistics table:

The columns in this table correspond to:

  • prediction error of the baseline model (also referred to as null-model)
  • Model: the prediction error of the estimated model.
  • R2: the proportional reduction of errors in the estimated model compared to the baseline model

The rows in this table correspond to:

  • Squared Error:Average prediction error based on squared error.
  • Minus Log-likelihood:Average prediction error based on minus the log-likelihood.
  • Absolute Error:Average prediction error based on absolute error.
  • Prediction error:Average prediction error based on proportion of prediction errors (for categorical variables only).

Prediction Table: For nominal and ordinal dependent variables, a prediction table that cross-classifies observed and against estimated values is provided.

Parameters table:

  • R2: class-specific and overall R2 values. The overall R2 indicates how well the dependent variable is overall predicted by the model (same measure as appearing in Prediction Statistics). For ordinal, continuous, and (binomial) counts, these are standard R2 measures. For nominal dependent variables, these can be seen as weighted averages of separate R2 measures for each category treated as a separate dichotomous response variable.
  • Intercept:intercept of the linear regression equation.
  • s.e.:standard errors of the parameters.
  • z-value:z-test statistics corresponding to the parameter tests.
  • Wald: Wald statistics are provided in the output to assess the statistical significance of the set of parameter estimates associated with a given variable. Specifically, for each variable, the Wald statistic tests the restriction that each of the parameter estimates in that set equals zero (for variables specified as Nominal, the set includes parameters for each category of the variable). For Regression models, by default, two Wald statistics (WaldWald(=)) are provided in the table when more than 1 class has been estimated. For each set of parameter estimates, the Wald(=) statistic considers the subset associated with each class and tests the restriction that each parameter in that subset equals the corresponding parameter in the subsets associated with each of the other classes. That is, the Wald(=) statistic tests the equality of each set of regression effects across classes.
  • p-value: measures of significance for the estimates.
  • Mean: means for the regression coefficients.
  • Std.Dev: standard deviations for the regression coefficients.

Classification: Outputs for each observation the posterior class memberships and the modal assignment based on the current model. 

References

Vermunt, J.K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18, 450-469. Link: http://members.home.nl/jeroenvermunt/lca_three_step.pdf

Vermunt, J.K., and Magidson, J. (2005). Latent GOLD 4.0 User's Guide. Belmont, MA: Statistical Innovations Inc.  http://www.statisticalinnovations.com/technicalsupport/LGusersguide.pdf

Vermunt, J.K., and Magidson, J. (2013). Technical Guide for Latent GOLD 5.0: Basic, Advanced, and Syntax. Belmont, MA: Statistical Innovations Inc.  http://www.statisticalinnovations.com/technicalsupport/LGtechnical.pdf

Vermunt, J.K., and Magidson, J. (2013). Latent GOLD 5.0 Upgrade Manual. Belmont, MA: Statistical Innovations Inc.  
http://statisticalinnovations.com/technicalsupport/LG5manual.pdf

About KCS

Kovach Computing Services (KCS) was founded in 1993 by Dr. Warren Kovach. The company specializes in the development and marketing of inexpensive and easy-to-use statistical software for scientists, as well as in data analysis consulting.

Mailing list Join our mailing list

Home | Order | MVSP | Oriana | XLStat
QDA Miner | Accent Composer | Stats Books
Stats Links | Anglesey

Share: FacebookFacebook TwitterTwitter RedditReddit
Del.icio.usDel.icio.us Stumble UponStumble Upon

 

Like us on Facebook

Get in Touch

  • Email:
    sales@kovcomp.com
  • Address:
    85 Nant y Felin
    Pentraeth, Isle of Anglesey
    LL75 8UY
    United Kingdom
  • Phone:
    (UK): 01248-450414
    (Intl.): +44-1248-450414