XLSTAT - Naive Bayes classifier

What is the Naive Bayes classifier?

The Naive Bayes classifier is a supervised machine learning algorithm that allows you to classify a set of observations according to a set of rules determined by the algorithm itself. This classifier has first to be trained on a training dataset that shows which class is expected for a set of inputs. During the training phase, the algorithm elaborates the classification rules on this training dataset that will be used in the prediction phase to classify the observations of the prediction dataset. Naive Bayes implies that classes of the training dataset are known and should be provided hence the supervised aspect of the technique.

Historically, the Naive Bayes classifier has been used in document classification and spam filtering. As of today, it is a renowned classifier that can find applications in numerous areas. It has the advantage of requiring a limited amount of training to estimate the necessary parameters and it can be extremely fast compared to some other techniques. Finally, in spite of its strong simplifying assumption of independence between variables (see description below), the naive Bayes classifier performs quite well in many real-world situations which makes it an algorithm of choice among the supervised Machine Learning methods.

At the root of the Naive Bayes classifier is the Bayes’ theorem with the naive assumption of independence between all pairs of variables/features

Naive Bayes classifier options in XLSTAT

Distribution of the quantitative variables

  • Same parametric/Empirical Distribution for all quantitative variables: this option allows you to choose the same parametric/empirical distribution for all quantitative variables.
  • Select a specific distribution for each quantitative variable: this option allows you to select for each quantitative variable a specific parametric distribution or to consider it as an empirical distribution. The parametric distribution can be selected from the following set of distributions: Normal, log-Normal, Gamma, exponential, logistic, Poisson, Binomial, Bernoulli, Uniform.

The qualitative variables are implicitly drawn from independent empirical distributions. The parameters of the selected parametric distributions are estimated using the moment method.

Breaking ties

Prediction using the naive Bayes approach can end up in a case where some classes have the same probability P(y). There are several ways to break ties for a given prediction. The following options are available:

  • Random breaker: chooses a random class in the set of classes having the same P(y).

  • Smallest Index: chooses the first class encountered in the set of classes having the same P(y).

Laplace smoothing parameter

The Laplace smoothing prevents from getting probabilities equal to zero or one.

Naive Bayes classifier results in XLSTAT

Results corresponding to the parameters involved in the classification process

The kind of probability distribution is reported.

The qualitative variables are considered to follow implicitly an empirical distribution.

The nature of the a priori distribution of the classes (uniform, not uniform) is also reported.

Results regarding the classifier

In order to evaluate and to score the naive Bayes classifier, a simple confusion matrix computed using the leave one out method as well as an accuracy index are displayed.

Results regarding the validation method

The error rate of the naive Bayes model obtained using the K folded-cross validation is reported. The number of folds is also reported to the user.

The cross validation results enables the selection of the adequate model parameters.

Result corresponding to the predicted classes

The predicted classes obtained using the naive Bayes classifier are displayed. In addition to the predicted classes, the a posteriori probabilities used to predict each observation are also reported.

XLSTAT

This analysis is available in the XLStat-Base addin for Microsoft Excel

About KCS

Kovach Computing Services (KCS) was founded in 1993 by Dr. Warren Kovach. The company specializes in the development and marketing of inexpensive and easy-to-use statistical software for scientists, as well as in data analysis consulting.

Mailing list Join our mailing list

Home | Order | MVSP | Oriana | XLStat
QDA Miner | Accent Composer | Stats Books
Stats Links | Anglesey

Share: FacebookFacebook TwitterTwitter RedditReddit
Del.icio.usDel.icio.us Stumble UponStumble Upon

 

Like us on Facebook

Get in Touch

  • Email:
    sales@kovcomp.com
  • Address:
    85 Nant y Felin
    Pentraeth, Isle of Anglesey
    LL75 8UY
    United Kingdom
  • Phone:
    (UK): 01248-450414
    (Intl.): +44-1248-450414