Lesson 4

Implementing the Naive Bayes Classifier from Scratch in Python


Welcome to our exploration tour of the Naive Bayes Classifier! This robust classification algorithm is renowned for its simplicity and effectiveness. We will implement it from scratch in Python, allowing you to leverage its sheer power without the need for any prebuilt libraries. Let's get started!


Let's do a quick recall of the probability theory.

P(A)P(A) usually denotes the likelihood of a certain event A occurring. P(AB)P(A|B), on the other hand, indicates the probability of event A taking place, assuming event B has already happened.

For instance, let's imagine there's a bag housing three marbles - one red and two blue. Denote A as the event where a red marble is picked, and B when a blue one is drawn. The probability of A, P(A)P(A), is 1/3 in this case.

Now, let's consider a scenario where a blue marble has been already drawn from the bag. This leaves us with one red and one blue marble in the bag. The probability of drawing a red marble (event A), given that a blue marble has already been extracted (event B), is denoted by P(AB)P(A|B). In this case, P(AB)P(A|B) would be 1/2, highlighting a higher likelihood of drawing a red marble following the initial removal of a blue one.

The Principle of Naive Bayes

The Naive Bayes algorithms rely on the Bayes' theorem. Let's recall it quickly. This theorem calculates the probability of an event based on prior knowledge of potentially related events. It is represented mathematically as:

P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A)P(A)}{P(B)}

Where P(AB)P(A|B) is the posterior probability of class (AA) given predictor (BB). It's what we are trying to calculate. P(BA)P(B|A) is the likelihood, which is the probability of the predictor given a class. P(B)P(B) is the marginal probability of predictor, and P(A)P(A) is the prior probability of the class. This formula forms the backbone of the Naive Bayes classifier.

The term 'naive' refers to the assumption that all variables in a dataset are independent of each other, which may not always be the case in real-life data. Nonetheless, it still offers robust performance and can be easily implemented.

Deriving the Naive Bayes Classifier Algorithm

In the context of machine learning, the Naive Bayes Classifier uses the Bayes theorem to compute the posterior probability of a class given a set of features and then classifies the outcome based on the highest posterior probability.

Assuming a binary class variable YY (binary means it can be equal to either 0 or 1) and features X1,X2,...,XnX_1, X_2, ..., X_n, our task is to compute the posterior probability P(Y=1X1=x1,X2=x2,...,Xn=xn)P(Y=1 | X_1=x_1, X_2=x_2,...,X_n=x_n). By shedding the denominator from Bayes' theorem (since it doesn't depend on YY and is constant for all classes), we are left with the task of maximizing the probability of YY and XX happening together P(Y,X)=P(XY)P(Y)P(Y, X) = P(X|Y)P(Y), which forms the basis for Naive Bayes classification.

Implementing Naive Bayes Classifier

We approach the implementation of the Naive Bayes Classifier by first calculating the prior probabilities of each class, and then the likelihood of each feature given a class:

1def calculate_prior_probabilities(y): 2 return y.value_counts(normalize=True) # calculates the proportion of each class in the data 3 4def calculate_likelihoods(X, y): 5 likelihoods = {} 6 for class_ in y.unique(): 7 for column in X.columns: 8 likelihoods[column + "|" + str(class_)] = X[y == class_][column].value_counts(normalize=True) 9 return likelihoods # returns a dict with likelihood of each class given a feature

Armed with these utility functions, we can implement the Naive Bayes Classifier function:

1def naive_bayes_classifier(X_test, priors, likelihoods): 2 class_probabilities = {} 3 4 for class_ in priors.index: 5 class_probabilities[class_] = 0 6 7 for index, data_point in X_test.iterrows(): 8 for class_ in priors.index: 9 class_likelihood = 1 10 for feature in X.columns: 11 class_likelihood *= likelihoods[feature + "|" + str(class_)].get(data_point[feature], 0) 12 13 class_probabilities[class_] += priors[class_] * class_likelihood 14 15 return max(class_probabilities, key=class_probabilities.get)
Understanding and Handling Data Issues in Naive Bayes

A recurring challenge in Naive Bayes is the handling of zero probabilities, i.e., when a category does not appear in the training data for a given class, resulting in a zero probability for that category. A known fix for this problem is applying Laplace or Add-1 smoothing, which adds a '1' to each category count to circumvent zero probabilities.

You can integrate Laplace smoothing into the calculate_likelihoods function as follows:

1def calculate_likelihoods_with_smoothing(X, y): 2 likelihoods = {} 3 for class_ in y.unique(): 4 for column in X.columns: 5 likelihoods[column + "|" + str(class_)] = (X[y == class_][column].value_counts() + 1) / (X[y == class_][column].count() + len(X[column].unique())) 6 return likelihoods # returns a dict with likelihood of each class given a feature

The numerator is increased by 1 and the denominator by the count of unique categories to accommodate the added 1's.

Using Naive Bayes Classifier

Here is a short example of predicting weather with our classifier:

1data = { 2 'Temperature': ['Hot', 'Hot', 'Cold', 'Hot', 'Cold', 'Cold', 'Hot'], 3 'Humidity': ['High', 'High', 'Normal', 'Normal', 'High', 'Normal', 'Normal'], 4 'Weather': ['Sunny', 'Sunny', 'Snowy', 'Rainy', 'Snowy', 'Snowy', 'Sunny'] 5} 6df = pd.DataFrame(data) 7 8# Split features and labels 9X = df[['Temperature', 'Humidity']] 10y = df['Weather'] 11 12# Calculate prior probabilities 13priors = calculate_prior_probabilities(y) 14 15# Calculate likelihoods with smoothing 16likelihoods = calculate_likelihoods_with_smoothing(X, y) 17 18# New observation 19X_test = pd.DataFrame([{'Temperature': 'Hot', 'Humidity': 'Normal'}]) 20 21# Make prediction 22prediction = naive_bayes_classifier(X_test, priors, likelihoods) 23print("Predicted Weather: ",prediction) # Output: Predicted Weather: Sunny

The Naive Bayes Classifier predicts a class label based on the observed features. Owing to its simplicity, power, and speed, this classifier lends itself to challenging scenarios, including text classification, spam detection, and sentiment analysis.

Lesson Summary and Practice

Superb work! You've mastered the essentials of the Naive Bayes Classifier, from understanding its theory to crafting a Naive Bayes Classifier from scratch. The next phase is practice, which will consolidate your newly acquired skills. Enjoy the hands-on exercises lined up next. Delve deeper into your machine learning journey with the forthcoming lessons!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.