Running heat map analysis in XLSTAT
Dataset for running heat map analysis in XLSTAT
For this tutorial we use a data table corresponding to 1847 proteins quantified on 19 samples extracted from one maize leaf and according to 4 extraction methods based on label-free shotgun proteomics (Langella et al. 2013). We are very grateful to the PAPPSO platform (Gif-Sur-Yvette, France) which provided the dataset and allowed us to use it for the tutorial.
Proteins are stored in rows and samples in columns.
An Excel sheet with both the data and the results can be downloaded by clicking here.
Goal of this tutorial
The aim of this tutorial is to use the heat map exploratory data analysis tool to analyze feature and sample clusterings simultaneously in a synthetic way. We will furthermore be able to check if clusters of similar features (proteins in our case) correspond to clusters of similar samples.
Heat map in XLSTAT: setting up the analysis
To run a heat map analysis in XLSTAT, click on XLSTAT-OMICs / Heat maps. In the General tab, select the data matrix in the Features/individuals table field. Here, the individuals are represented by our samples. You do not need to change the features in rows option, as proteins are stored in rows in the dataset.
In the Options tab, activate the Non-specific filtering option, select Interquartile range< and enter a threshold of, say, 0.25. This will eliminate all proteins with an interquartile range lower than 0.25 (i.e. with low variability). This will enhance the readability of the heat map.
In the Missing data tab, we set up a missing data estimation using the Nearest neighbour algorithm.
In the Charts tab, choose the color scale of the heat map and play with the width and height options to optimize chart size.
Heat map in XLSTAT: analyzing the output
First of all, we see that non-specific filtering eliminated 1597 proteins prior to heat map computations.
Heat map chart: proteins are clustered in rows and samples in columns.
If we analyze sample and protein dendrograms individually, we clearly see that:
- Proteins are divided into two groups (left dendrogram) roughly corresponding to membrane-specific proteins and other proteins.
- Samples are divided into three groups (upper dendrogram). The large cluster on the left corresponds to samples quantified with the URZB and URNB extraction methods. The middle cluster includes samples quantified with the TCA1 method and the last one involves the TEAL method.
The map represents values in the dataset re-arranged according to the dendrograms.
Let’s focus on rectangle / square patterns inside the map.
- The green and red large rectangles on the left show that for samples extracted using the URZB and URNB methods, we have a relatively high expression of the top cluster of proteins compared to the lower cluster.
- On the other hand, the TEAL samples (right part of the map) display an inverse pattern of protein quantities (relatively low for the top protein cluster and relatively high for the bottom cluster).
- Finally, the TCA1 samples (middle cluster) seem to exhibit intermediate quantities for most proteins (nevertheless, proteins at the bottom of the map are relatively more abundant than proteins at the top).
Langella O, Valot B, Jacob D, Balliau T, Flores R, Hoogland C, Joets J, Zivy M(2013) Management and dissemination of MS proteomic data with PROTICdb: example of a quantitative comparison between methods of protein extraction, Proteomics. 2013 May;13(9):1457-66.
Click here for other tutorials.