Running a gaussian mixture model clustering with XLSTAT

Gaussian mixture models for clustering

These models are commonly used for a clustering purpose. They can provide a framework for assessing the partitions of the data by considering that each component represents a cluster. These models have two main advantages:

  • It is a probabilistic method for obtaining a fuzzy classification of the observations. The probability of belonging to each cluster is calculated and a classification is usually achieved by assigning each observation to the most likely cluster. These probabilities can also be used to interpret suspected classifications.
  • Mixture modeling is very flexible.

Dataset for Gaussian mixture model

The data correspond to the famous iris of Fisher presented in [Fisher, R. A. (1936), The use of multiple measurements in taxonomic problems.Annals of Eugenics7, Part II, 179–188]

These data gives the measurements (in centimeters) of the petal length and width, for 150 flowers of 3 species of iris (setosaversicolor, andvirginica).

An Excel sheet containing both the data and the results for use in this tutorial can be downloaded by clicking here.

 The aim is to fit a Gaussian mixture model and recover the data structure in three clusters.

Setting up a Gaussian mixture model

After opening XLSTAT, select the XLSTAT / XLSTAT-MX/ Gaussian mixture modelscommand, or click on the corresponding button of theXLSTAT- MX toolbar.

menu mixture models

 Once you've clicked on the button, the dialog box appears.

The data are presented in a table of 150 rows and 2 columns. It is assumed that the labels are unknown and that the weight of each row is the same. As the classification of the data is done according to the length and width of the iris petal, the option Multidimensional is chosen.

dialog box mixture models general

In the Options(1) tabthree inference algorithms with four selection criteria and three methods of initialization are proposed. The user can also set the maximum number of iterations of the inference algorithm and its convergence threshold. Here, we choose a random initialization with two replicates and leave all the other options to their default values.

dialog box mixture models options

In the Options(2) tab,alist of all the Gaussian mixture models is available. The maximum and minimum number of classes can be modified and the mixture proportions can be forced to be equal. Here, we choose to test the EEE and EEV models for a number of classes which varies from 2 to 5.

dialog box mixture models options 2

The computations begin once you have clicked on OK. The results will then be displayed in a new sheet.

Interpreting the results of a gaussian mixture model clustering

The first results displayed are the statistics for the various varaibles (length and width). Next, the value of the selection criterion for all models and for a number of classes which varies from 2 to 5 are displayed.

mixture models bic criterion

Then the estimated parameters of the selected model are given (proportions, means and variances).

 mixture models proportions means

mixture models covariance

A table displaying the characteristics of the selected model is then presented (BIC, AIC, log likelihood, NEC, ...).

In the next table the results in terms of probability estimation and classification for the first observations of the data set available are showed. The classification is computed according to the probabilities via the MAP (Maximum A Posteriori) rule. We can see that 3 classes have been selected.

Posterior probability classes mixture models

Finally, a graph of the clustered data is provided.

MAP classification mixture models

 Many other features and options are available in the mixture models with XLSTAT including observation weights, partial labeling, 14 inference algorithm.

Click here for other tutorials.

About KCS

Kovach Computing Services (KCS) was founded in 1993 by Dr. Warren Kovach. The company specializes in the development and marketing of inexpensive and easy-to-use statistical software for scientists, as well as in data analysis consulting.

Mailing list Join our mailing list

Home | Order | MVSP | Oriana | XLStat
QDA Miner | Accent Composer | Stats Books
Stats Links | Anglesey

Share: FacebookFacebook TwitterTwitter RedditReddit Stumble UponStumble Upon


Like us on Facebook

Get in Touch

  • Email:
  • Address:
    85 Nant y Felin
    Pentraeth, Isle of Anglesey
    LL75 8UY
    United Kingdom
  • Phone:
    (UK): 01248-450414
    (Intl.): +44-1248-450414