Gaussian mixture models

Tutorial
View a tutorial

What are the Gaussian mixture models?

First reference to mixture modeling start with Pearson in 1894 but their development is mainly due to the EM algorithm (Expectation Maximization) of Dempster et al. in 1978.

These models are commonly used for a clustering purpose. They can provide a framework for assessing the partitions of the data by considering that each component represents a cluster. These models have two main advantages:

  • It is a probabilistic method for obtaining a fuzzy classification of the observations. The probability of belonging to each cluster is calculated and a classification is usually achieved by assigning each observation to the most likely cluster. These probabilities can also be used to interpret suspected classifications.
  • Mixture modeling is very flexible.

The aim of mixture models is to structure dataset into several clusters. XLSTAT proposes the use of a mixture of Gaussian distributions.

Mixture models in XLSTAT

 By controlling the covariance matrix according to the eigenvalue decomposition of Celeux et al., XLSTAT offers 14 different Gaussian mixture models. It is also possible to force the mixing proportions to be equal.

Inference algorithms used in XLSTAT for mixture models

 XLSTAT offers the possibility to use three different inference algorithms to estimate the Gaussian parameters of the 14 models:

  • EM: This is the standard algorithm used for inference in mixture models.
  • SEM: This is a stochastic version of the EM algorithm. By adding a stochastic step for assigning observations to clusters. This algorithm can lead to empty clusters and disrupt the parameters estimation.
  • CEM: This is a classifying version of the EM algorithm. A classification step is added for assigning observations to clusters by the MAP rule (Maximum A Posteriori). This algorithm can lead to empty clusters and disrupt the parameters estimation.

Select the number of components in XLSTAT

 In practice, the number of components is often unknown, XLSTAT offers four different criteria to estimate the number of components:

  • BIC: The Bayesian Information Criterion is a penalized likelihood-based criterion. This is the criterion commonly used in mixture models.
  • AIC: the Akaike Information Criterion is a penalized likelihood-based criterion. This criterion tends to overestimate the number of components.
  • ICL: the Integrated Complete Likelihood is a penalized likelihood-based criterion, it is the BIC penalized by the entropy. This criterion focuses on the model that provides well-separated clusters. Generally, the selected number of components is lower than BIC one.
  • NEC: the Normalized Entropy Criterion. This criterion looks for model that provides well-separated clusters. The NEC is not defined for a model with one component. This criterion is used to select the number of components and not the covariance matrix.

Results of the mixture models in XLSTAT

 XLSTAT offers the following results for mixture models:

  • The values of the selection criterion for the selected set of models and for a number of components varying within a range defined by the user.
  • Estimation of model parameters:  mixing proportions, means and variances by cluster for the selected model.
  • Some characteristics of the selected model: BIC, AIC, ICL, log-likelihood, NEC, Entropy and DF.
  • The probability of belonging to each cluster and the MAP classification.

In the one-dimensional case, XLSTAT offers two diagnostic plots:

  • Plot of the empirical cumulative distribution function against the estimated one.
  • Q-Q plot between the quartiles of the empirical distribution and the estimated mixture.

XLSTAT

This analysis is available in the XLStat-Base addin for Microsoft Excel

About KCS

Kovach Computing Services (KCS) was founded in 1993 by Dr. Warren Kovach. The company specializes in the development and marketing of inexpensive and easy-to-use statistical software for scientists, as well as in data analysis consulting.

Mailing list Join our mailing list

Home | Order | MVSP | Oriana | XLStat
QDA Miner | Accent Composer | Stats Books
Stats Links | Anglesey

Share: FacebookFacebook TwitterTwitter RedditReddit
Del.icio.usDel.icio.us Stumble UponStumble Upon

 

Like us on Facebook

Get in Touch

  • Email:
    sales@kovcomp.com
  • Address:
    85 Nant y Felin
    Pentraeth, Isle of Anglesey
    LL75 8UY
    United Kingdom
  • Phone:
    (UK): 01248-450414
    (Intl.): +44-1248-450414