02 Jul

Naive Bayes Classifiers

Naive Bayes is a classification algorithm (supervised learning) for binary (two-class) and multi-class classification problems. The technique is easiest to understand when described using binary or categorical input values.

It is called naive Bayes or idiot Bayes because the calculation of the probabilities for each hypothesis are simplified to make their calculation tractable. Rather than attempting to calculate the values of each attribute value P(d1, d2, d3|h), they are assumed to be conditionally independent given the target value and calculated as P(d1|h) * P(d2|H) and so on.

This is a very strong assumption that is most unlikely in real data, i.e. that the attributes do not interact. Nevertheless, the approach performs surprisingly well on data where this assumption does not hold.

It has two parts we cab say:
Class Probabilities: The probabilities of each class in the training dataset.
For example in a binary classification the probability of an instance belonging to class 1 would be calculated   as:

P(class=1) = count(class=1) / (count(class=0) + count(class=1))

Conditional Probabilities: The conditional probabilities of each input value given each class value.

For example, if a “weather” attribute had the values “sunny” and “rainy” and the class attribute had the class values “go-out” and “stay-home“, then the conditional probabilities of each weather value for each class value could be calculated as:

P(weather=sunny|class=go-out) = count(instances with weather=sunny and class=go-out) / count(instances with class=go-out)
P(weather=sunny|class=stay-home) = count(instances with weather=sunny and class=stay-home) / count(instances with class=stay-home)
P(weather=rainy|class=go-out) = count(instances with weather=rainy and class=go-out) / count(instances with class=go-out)
P(weather=rainy|class=stay-home) = count(instances with weather=rainy and class=stay-home) / count(instances with class=stay-home)

Pros and cons of Naive Bayes Classifiers

- Computationally fast
- Simple to implement
- Works well with small datasets
- Works well with high dimensions
- Perform well even if the Naive Assumption is not perfectly met. In many cases, the approximation is enough to build a good classifier.

- Require to remove correlated features because they are voted twice in the model and it can lead to over inflating importance.
- If a categorical variable has a category in test data set which was not observed in training data set, then the model will assign a zero probability. It will not be able to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation. Sklearn applies Laplace smoothing by default when you train a Naive Bayes classifier.

Follow and Subscribe: