View tutorialsXLSTAT-MX is a dedicated statistical software solution for Market Research analysis. It is a must have complement for XLSTAT-Pro users who do sensory data analysis, and who use Preference Mapping or other related techniques to understand the customers' perceptions. MX stands for Marketing analytics. The tools of the XLSTAT-MX module can be accessed from the XLSTAT menu of from the XLSTAT toolbar.
- Preference Mapping
- Semantic differential
- Generalized Procrustes Analysis (GPA)
- Multiple Factor Analysis
- Penalty analysis
- Product Characterization
- Design of Experiments for Sensory Data Analysis
- TURF analysis
The following video explains how to use XLStat-MX:
A trial version of XLSTAT-MX is included in the main XLSTAT-Pro download.
Prices and ordering
For prices, on-line ordering and other purchasing information please go to our ordering page.
View a tutorialWhat is preference mapping?
Preference Mapping allows to build maps which are useful in a variety of domains. A preference map is a decision support tool in analyses where a configuration of objects has been obtained from a first analysis (PCA, MCA, MDS), and where a table with complementary data describing the objects is available (attributes or preference data).
What can I learn using preference mapping?
In the market research and consumer analytics domains (sensory data analysis), "Prefmap" is used to analyze products (the objects) and to answer questions such as:
- How is our product positioned compared with the competitors' products?
- Which product is the closest to ours?
- Which type of consumer prefers my product?
- Why are the competitors' products positioned as such?
- How can I reposition my product so that it fits better my target group?
- What success can I expect from my product?
- Which new products should I encourage the R&D department to create?
Preference mapping provides a powerful approach to optimizing product acceptability.
Preference mapping projection metods
XLSTAT-MX offers several regression models to project complementary data on the objects maps:
- Vector model,
- Circular ideal point model,
- Elliptical ideal point model,
- Quadratic ideal point model.
XLSTAT results for preference mapping
XLSTAT-MX displays detailed results in addition to the preference map to facilitate the interpreting of results.
The preference map is a summary view of three types of element:
The judges (or groups of judges if a classification of judges has been carried out beforehand) represented in the corresponding model by a vector, an ideal point (labeled +), an anti-ideal point (labeled -), or a saddle point (labeled o);
The objects whose position on the map is determined by their coordinates;
The descriptors which correspond to the representation axes with which they are associated (when a PCA precedes the PREFMAP, a biplot from the PCA is studied to interpret the position of the objects as a function of the objective criteria).
The PREFMAP, with the interpretation given by the preference map is an aid to interpretation and decision-making which is potentially very powerful since it allows preference data to be linked to objective data. However, the models associated with the judges must be adjusted correctly in order that the interpretation is reliable.
The preference score for each object for a given judge, whose value is between 0 (minimum) and 1 (maximum), is calculated from the prediction of the model for the judge. The more the product is preferred, the higher the score. A preference order of objects is deducted from the preference scores for each of the judges.
The contour plot shows the regions corresponding to the various preference consensus levels on a chart whose axes are the same as the preference map. At each point on the chart, the percentage of judges for whom the preference calculated from the model is greater than their mean preference is calculated. In the regions with cold colors (blue), a low proportion of models give high preferences. On the other hand, the regions with hot colors (red) indicate a high proportion of models with high preferences.
View a tutorialWhat is a semantic differential chart
The semantic differential function is a visualization method that has been developed by the psychologist Charles E. Osgood in order to plot the differences between individuals' connotations for a given word. When applying the method, Osgood asked survey participants to describe a word on a series of scales ranging from one extreme to the other (for example favorable/unfavorable). When patterns were significantly different form one individual to the other or from one group of individuals to the other, Osgood could then interpret the Semantic Differential as a mapping of the psychological or even behavioral distance between the individuals or groups.
Use of semantic differential charts
This method can also be used for a variety of applications:
- Analysis of the experts’ perceptions for a product (for example a yogurt) described by a series of criteria (for example, acidity, saltiness, sweetness, softness) on similar scales (either from one extreme to the other, or on a similar like scale for each criterion). A Semantic differential chart will allow you to quickly see which experts agree, and if significantly different patterns are obtained.
- Survey analysis after a customer satisfaction survey.
- Profile analysis of candidates during a recruitment session.
View a tutorialWhen to use Generalized Procrustes Analysis
Generalized Procrustean Analysis is used in sensory data analysis before a Preference Mapping to reduce the scale effects and to obtain a consensual configuration. It also allows comparing the proximity between the terms that are used by different experts to describe products.
Principle of Generalized Procrustes Analysis
We define by configuration an n x p matrix that corresponds to the description of n objects (or individuals/cases/products) on p dimensions (or attributes/variables/criteria/descriptors).
We name consensus configuration the mean configuration computed from the m configurations. Procrustes Analysis is an iterative method that allows to reduce, by applying transformations to the configurations (rescaling, translations, rotations, reflections), the distance of the m configurations to the consensus configuration, the latter being updated after each transformation.
Let us take the example of 5 experts rating 4 cheeses according to 3 criteria. The ratings can go from 1 to 10. One can easily consider that an expert tends to be harder in his notation, leading to a shift to the bottom of the ratings, or that another expert tends to give ratings around the average, without daring to use extreme ratings. To work on an average configuration could lead to false interpretations. One can easily see that a translation of the ratings of the first expert is necessary, or that rescaling the ratings of the second expert would make his ratings possibly closer to those of the other experts.
Once the consensus configuration has been obtained, it is possible to run a PCA (Principal Components Analysis) on the consensus configuration in order to allow an optimal visualization in two or three dimensions.
There exist two cases:
- If the number and the designation of the p dimensions are identical for the m configurations, one speaks in sensory analysis about conventional profiles.
- If the number p and the designation of the dimensions varies from one configuration to the other, one speaks in sensory analysis about free profiles, and the data can then only be represented by a series of m matrices of size n x p(k), k=1,2, …, m.
Algorithms for Generalized Procrustes Analysis used in XLSTAT
XLSTAT is the unique product offering the choice between the two main available algorithms: the one based on the works initiated by John Gower (1975), and the later one described in the thesis of Jacques Commandeur (1991). Which algorithm performs best (in terms of least squares) depends on the dataset, but the Commandeur algorithm is the only one that allows to take into account missing data; by missing data we mean here that for a given configuration and a given observation or row, the values were not recorded for all the dimensions of the configuration. The latter can happen in sensory data analysis if one of the judges has not evaluated a product.
Results for the Generalized Procrustes Analysis in XLSTAT
Inspired from the format of the analysis of variance table of the linear model, this table allows you to evaluate the relative contribution of each transformation to the evolution of the variance. In this table are displayed the residual variance before and after the transformations, the contribution to the evolution of the variance of the rescaling, rotation and translation steps. The computing of the Fisher’s F statistic enables you to compare the relative contributions of the transformations. The corresponding probabilities help you to determine whether the contributions are significant or not.
Residuals by object: This table and the corresponding bar chart allow to visualize the distribution of the residual variance by object. Thus, it is possible to identify for which objects the GPA has been the less efficient, in other words, which objects are the farther from the consensus configuration.
Residuals by configuration: This table and the corresponding bar chart allow you to visualize the distribution of the residual variance by configuration. Thus, it is possible to identify for which configurations the GPA has been the less efficient, in other words, which configurations are the farther from the consensus configuration.
Scaling factors for each configuration
Scaling factors for each configuration presented either in a table or a plot allow to compare the scaling factors applied to the configurations. It is used in sensory analysis to understand how the experts use the rating scales.
Results of the consensus test
The number of permutations that have been performed, the value of Rc which corresponds to the proportion of the original variance explained by the consensus configuration, and the quantile corresponding to Rc, calculated using the distribution of Rc obtained from the permutations are displayed to evaluate the effectiveness of the Generalized Procrustean Analysis. You need to set a confidence interval (typically 95%), and if the quantile is beyond the confidence interval, one concludes that the Generalized Procrustean Analysis significantly reduced the variance.
Results of the dimensions test
For each factor retained at the end of the PCA step, the number of permutations that have been performed, the F calculated after the Generalized Procrustean Analysis (F is here the ratio of the variance between the objects, on the variance between the configurations), and the quantile corresponding to F calculated using the distribution of F obtained from the permutations are displayed to evaluate if a dimension contributes significantly to the quality of the Generalized Procrustean Analysis.
You need to set a confidence interval (typically 95%), and if the quantile is beyond the confidence interval, one concludes that factor contributes significantly. As an indication are also displayed, the critical values and the p-value that corresponds to the Fisher’s F distribution for the selected alpha significance level. It may be that the conclusions resulting from the Fisher’s F distribution is very different from what the permutations test indicates: using Fisher’s F distribution requires to assume the normality of the data, which is not necessarily the case.
Results for the consensus configuration
- Objects coordinates before the PCA: This table corresponds to the mean over the configurations of the objects coordinates, after the Generalized Procrustean Analysis transformations and before the PCA.
- Eigenvalues: If a PCA has been requested, the table of the eigenvalues and the corresponding scree-plot are displayed. The percentage of the total variability corresponding to each axis is computed from the eigenvalues.
- Correlations of the variables with the factors: These results correspond to the correlations between the variables of the consensus configuration before and after the transformations (Generalized Procrustean Analysis and PCA if the latter has been requested). These results are not displayed on the circle of correlations as they are not always interpretable.
- Objects coordinates: This table corresponds to the mean over the configurations of the objects coordinates, after the transformations (Generalized Procrustean Analysis and PCA if the latter has been requested). These results are displayed on the objects charts.
Results for the configurations after transformations
- Variance by configuration and by dimension: This table allows to visualize how the percentage of total variability corresponding to each axis is divided up for the configurations.
- Correlations of the variables with the factors: These results, displayed for all the configurations, correspond to the correlations between the variables of the configurations before and after the transformations (GPA and PCA if the latter has been requested). These results are displayed on the circle of correlations.
- Objects coordinates (presentation by configuration): This series of tables corresponds to the objects coordinates for each configuration after the transformations (GPA and PCA if the latter has been requested). These results are displayed on the first series of objects charts.
- Objects coordinates (presentation by object): This series of tables corresponds to the objects coordinates for each configuration after the transformations (GPA and PCA if the latter has been requested). These results are displayed on the second series of objects charts.
View a tutorialWhen to use Multiple Factor Analysis
Multiple Factor Analysis (MFA) makes it possible to analyze several tables of variables simultaneously, and to obtain results, in particular charts, that allow studying the relationship between the observations, the variables and tables. Within a table the variables must be of the same type (quantitative or qualitative), but the tables can be of different types.
This method can be very useful to analyze surveys for which one can identify several groups of variables, or for which the same questions are asked at several time intervals.
Principles of Multiple Factor Analysis
The Multiple Factor Analysis is a synthesis of the PCA (Principal Component Analysis) and the MCA (Multiple Correspondence Analysis) that it generalizes to enable the use of quantitative and qualitative variables. The methodology of the MFA breaks up into two phases:
- We successively carry out for each table a PCA or an MCA according to the type of the variables of the table. One stores the value of the first eigenvalue of each analysis to then weigh the various tables in the second part of the analysis.
- One carries out a weighted PCA on the columns of all the tables, knowing that the tables of qualitative variables are transformed into complete disjunctive tables, each indicator variable having a weight that is a function of the frequency of the corresponding category. The weighting of the tables makes it possible to prevent that the tables that include more variables do not weigh too much in the analysis.
The originality of method is that it allows visualizing in a two or three dimensional space, the tables (each table being represented by a point), the variables, the principal axes of the analyses of the first phase, and the individuals. In addition, one can study the impact of the other tables on an observation by simultaneously visualizing the observation described by the all the variables and the projected observations described by the variables of only one table.
Results for Multiple Factor Analysis
This table shows the correlations between all the quantitative variables. The type of coefficient depends on what has been chosen in the dialog box.
Results on individual tables
The results of the analyses performed on each individual table (PCA or MCA) are then displayed. These results are identical to those you would obtain after running the PCA or MCA function of XLSTAT.
Multiple Factor Analysis
Afterwards, the results of the second phase of the MFA are displayed.
- Eigenvalues: The eigenvalues and corresponding chart (scree plot) are displayed. The number of eigenvalues is equal to the number of non-null eigenvalues.
- Eigenvectors: This table shows the eigenvectors obtained from the spectral decomposition. These vectors take into account the variable weights used in the Multiple Factor Analysis.
- Coordinates of the tables: The coordinates of the tables are then displayed and used to create the plots of the tables. The latter allow to visualize the distance between the tables. The coordinates of the supplementary tables are displayed in the second part of the table.
- Contributions (%): Contributions are an interpretation aid. The tables which had the highest influence in building the axes are those whose contributions are highest.
- Squared cosines: As in other factor methods, squared cosine analysis is used to avoid interpretation errors due to projection effects. If the squared cosines associated with the axes used on a chart are low, the position of the observation or the variable in question should not be interpreted.
- Lg coefficients: The Lg coefficients of relationship between the tables allow to measure to what extend the tables are related two by two. The more variables of a first table are related to the variables of the second table, the higher the Lg coefficient.
- RV coefficients: The RV coefficients of relationship between the tables are another measure derived from the Lg coefficients. The value of the RV coefficients varies between 0 and 1.
Results for quantitative variables
The results that follow concern the quantitative variables. As for a PCA, the coordinates of the variables (factor loadings), their correlation with the axes, the contributions and the squared cosines are displayed.
The coordinates of the partial axes, and even more their correlations, allow to visualize in the new space the link between the factors obtained from the first phase of the Multiple Factor Analysis, and those obtained from the second phase.
The results that concern the observations are then displayed as they are after a PCA (coordinates, contributions in %, and squared cosines).
Last, the coordinates of the projected points in the space resulting from the Multiple Factor Analysis are displayed. The projected points correspond to projections of the observations in the spaces reduced to the dimensions of each table. The representation of the projected points superimposed with those of the complete observations makes it possible to visualize at the same time the diversity of the information brought by the various tables for a given observation, and to visualize the relative distances from two observations according to the various tables.
View a tutorialWhat is penalty analysis
Penalty analysis is a method used in sensory data analysis to identify potential directions for the improvement of products, on the basis of surveys performed on consumers or experts.
Two types of data are used:
- Preference data (or liking scores) that correspond to a global satisfaction index for a product (for example, liking scores on a 9 point scale for a chocolate bar), or for a characteristic of a product (for example, the comfort of a car rated from 1 to 10).
- Data collected on a JAR (Just About Right) scale. These correspond to ratings ranging from 1 to 6 for one or more characteristics of the product of interest. 1 corresponds not « Not enough at all », 2 to « Not enough », 3 to « JAR » (Just About Right), an ideal for the consumer, 4 to « Too much » and 5 to « Far too much ». For example, for a chocolate bar, one can rate the bitterness, and for the comfort of the car, the sound volume of the engine.
The method, based on multiple comparisons such as those used in ANOVA, consists in identifying, for each characteristic studied on the JAR scale, if the rankings on the JAR scale are related to significantly different results in the liking scores.
For example, if a chocolate is too bitter, does that significantly impact the liking scores?
The word penalty comes from the fact that we are looking for the characteristics which can penalize the consumer satisfaction for a given product. The penalty is the difference between the mean of the liking scores for the JAR category, and the mean of the scores for the other categories.
Principle of penalty analysis
Penalty analysis is subdivided into three phases:
- The data of the JAR scale are aggregated: on one hand, categories 1 and 2 are grouped, and on the other hand categories 4 and 5 are grouped, which leads to a three point scale. We now have three levels: "Not enough", "JAR", and "Too much".
- We then compute and compare the means of the liking scores for the three categories, to identify significant differences. The difference between the means of the 2 non-JAR categories and the JAR category is called mean drops.
- We compute the penalty and test if it is significantly different from 0.
Results for penalty analysis for XLSTAT
After the display of the basic statistics and the correlation matrix for the liking scores and the JAR data, XLSTAT displays a table that shows for each JAR dimension the frequencies for the 5 levels (or 7 or 9 depending on the selected scale). The corresponding stacked bar diagram is then displayed.
The table of the collapsed data on three levels is then displayed, followed by the corresponding relative frequencies table and the stacked bar diagram.
The penalty table allows you to visualize the statistics for the 3 point scale JAR data, including the means, the mean drops, the penalties and the results of the multiple comparisons tests.
Last, the summary charts enable you to quickly identify the JAR dimensions for which the differences between the JAR category and the 2 non-JAR categories ("Not enough", "Too much") are significantly different: when the difference is significant, the bars are displayed in red color, whereas they are displayed in green color when the difference is not significant. The bars are displayed in grey when the size of a group is lower than the select threshold (see the Options tab of the dialog box).
The mean drop vs % chart displays the mean drops as a function of the corresponding % of the population of testers. The threshold % of the population over which the results are considered significant is displayed with a dotted line.
View a tutorialWhen to use product characterization
Product characterization provides the XLSTAT users with a user-friendly analytical method that helps finding in a sensory study which descriptors are discriminating well a set of products. You can also identify which are the most important characteristics of each product.
Computation for product characterization
All computations are based on the analysis of variance (ANOVA) model.
The data table must have a given format. Each row should concern a given product, eventually a given session and should gather scores given by a judge for one or more descriptors associated to the designated product. The dataset must contain the following columns: one identifying the judge, one identifying the product, eventually one identifying the session, and as many columns as there are descriptors or characteristics.
For each descriptor an ANOVA model is applied to check if the scores given by the judges are significantly different. The simplest model is:
Score = product effect + judge effect
If different sessions have been organized (each judge has evaluated at least twice each product), the session factor can be added and the model becomes:
Score = product effect + judge effect + session effect
An interaction factor can also be included. We then can test if some combines of the judges and products are giving higher or lower grades on the descriptors. The model is:
Score = product effect + judge effect + product effect * judge effect
The judge effect is always supposed to be random. It means we consider each judge to have its own way of giving scores to the products (on the score scale).
Product characterization is a very efficient tool to characterize products using judges’ preferences.
Results of XLSTAT for product characterization
Discriminating power by descriptor
This table shows the ordered descriptors from the most discriminating on the products to the least discriminating. Associated V-test and p-values are also displayed.
This table displays the various coefficients of the chosen model for each combination product-descriptor. Adjusted mean, t test, p-value and confidence interval for each combination are also displayed. Graphics for each product with the coefficients are then displayed.
Adjusted means by product
This table shows the adjusted mean for each combination product-descriptor. The color corresponds to a significant positive effect for the blue color and a significant negative effect for the red color.
View a tutorialWhy do we use Design of Experiment in Sensory data analysis
Designing an experiment is a fundamental step for anyone who wants to ensure that data collected will be statistically usable in the best possible way. No use to evaluate products from a panel of judges if the products cannot be compared under statistically reliable conditions. It is also not necessary to have each judge evaluate all products to compare products between them.
This tool is designed to provide specialists in sensory analysis to provide a simple and powerful tool to prepare a sensory evaluation where judges (experts and/or consumers) evaluate a set of products.
What does XLSTAT Design of Experiment in Sensory data analysis tools take into account
When you want a panel of consumers to evaluate a set of products, the first issue that arises is what is the appropriate number of consumers that should be involved, knowing that there may be technical constraints (a limited number of trained consumers is available), or budgetary constraints.
Once the number of consumer is defined, the question of the maximum number of products that a consumer can evaluate during each session arises.
It remains to determine which products will be evaluated by each of the consumers in each session, and in what order. It is possible that the order has an influence. To avoid penalizing certain products we should ensure that products are seen as often as possible in the three different positions during each session.
Furthermore, it is possible that some sequences of products also have a bearing on sensory assessments. We restrict here to consider pairs of products (carry-over of order 2). As for the order, we will also ensure that different ordered pairs, be present at a frequency as homogeneous as possible in the design.
When generating the plan we therefore try to reconcile the following three requirements:
- Products must be seen by as many judges as possible and with an overall frequency of the different products as homogeneous as possible,
- Each product must be seen in the different orders during each session, with an overall frequency for each pair (order, product) as homogeneous as possible
- The different ordered pairs of products must be present in the design of experiments with a frequency as homogeneous as possible.
XLSTAT Design of Experiment in Sensory data analysis quality criteria
XLSTAT allows users to search an optimal design within the meaning of the A-efficiency or the D-efficiency, and whether in the case of complete plans or in the case of incomplete block designs, whether balanced or not.
Order of the product
Once the design is found (the matrix N is known), we need to order products to optimize the in terms of column frequency and carry-over (Périnel and Pagès, 2004). We want that each product is present the same number of times at a given position, and that each ordered pair is also present the same number of times. In order to obtain that, XLSTAT uses two matrices: the matrix of column frequencies and the matrix of carry-over.
XLSTAT algorithm for Design of Experiment in Sensory data analysis
The optimization algorithm is iterative. It is sometimes necessary to split sensory evaluations into sessions. To generate a design that takes into account the need for sessions, XLSTAT uses the same intial design for each session and then applies permutations to both rows and columns, while trying to keep as even as possible column frequencies and carry-over. When the designs are resolvable or near resolvable, the same judge will not be testing twice the same product during two different sessions.
View a tutorialWhy do we use TURF analysis
The TURF (Total Unduplicated Reach and Frequency) method is used in marketing to highlight a line of products from a complete range of products in order to have the highest market share. From all the products of a brand, we can obtain a subset, which should be the line of products with the maximum reach.
For example, let’s consider an ice cream manufacturer producing 30 different flavors and who wants to put forward a line of six flavors that will reach as many consumers as possible. Thus, he submitted a questionnaire to a panel of 500 consumers who scored each flavor on a scale from 1 to 10. The manufacturer believes that the consumer will be satisfied and inclined to choose the flavor if he gives a score above 8. TURF analysis will look for the combination of 6 flavors with greatest reach and frequency.
Principles of TURF analysis
This method is a simple statistical method. It is based on a questionnaire (with scores on a fixed scale). The analysis runs through every possible combination of products and records for each combination (1) the percentage of those that desire at least 1 product in the given combination (i.e. reach), and (2) the total number of times products are desired in the given combination (i.e. frequency).
XLSTAT algorithms for TURF analysis
XLSTAT offers a variety of techniques to find the best combination of products:
- the enumeration method will test all the combinations but may be time consuming;
- the greedy algorithm is very fast but can stop on a local optimum
- the fast search algorithm is close from the enumeration method but it is faster and does not guarantee the optimal solution.
Results of TURF analysis in XLSTAT
Frequencies by product
This table displays the frequency with which the objective has been reached for each product.
Product lines obtained with the TURF analysis
This table displays for each selected combination: the Reach, the frequency and the name of each product.
Product lines obtained with the TURF analysis (%)
This table displays for each selected combination: the percentage of observations for which the objective has been reached, the frequency in percentage, and the frequency in percentage for each product in each combination.
Copyright © 2013 Kovach Computing Services, Anglesey, Wales. All Rights Reserved. Portions copyright Addinsoft and Provalis Research.
Last modified 5 April, 2013