Lesson 4
Implementing the Naive Bayes Classifier from Scratch in Python
Introduction

Welcome to our exploration tour of the Naive Bayes Classifier! This robust classification algorithm is renowned for its simplicity and effectiveness. We will implement it from scratch in Python, allowing you to leverage its sheer power without the need for any prebuilt libraries. Let's get started!

Recall

Let's do a quick recall of probability theory.

P(A)P(A) usually denotes the likelihood of a certain event A occurring. P(AB)P(A|B), on the other hand, indicates the probability of event A taking place, assuming event B has already happened.

For instance, let's imagine there's a bag housing three marbles - one red and two blue. Denote A as the event where a red marble is picked, and B when a blue one is drawn. The probability of A, P(A)P(A), is 1/3 in this case.

Now, let's consider a scenario where a blue marble has been already drawn from the bag. This leaves us with one red and one blue marble in the bag. The probability of drawing a red marble (event A), given that a blue marble has already been extracted (event B), is denoted by P(AB)P(A|B). In this case, P(AB)P(A|B) would be 1/2, highlighting a higher likelihood of drawing a red marble following the initial removal of a blue one.

The Principle of Naive Bayes

The Naive Bayes algorithms rely on the Bayes' theorem. Let's recall it quickly. This theorem calculates the probability of an event based on prior knowledge of potentially related events. It is represented mathematically as:

P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A)P(A)}{P(B)}

Where P(AB)P(A|B) is the posterior probability of class (AA) given predictor (BB). It's what we are trying to calculate. P(BA)P(B|A) is the likelihood, which is the probability of the predictor given a class. P(B)P(B) is the marginal probability of predictor, and P(A)P(A) is the prior probability of the class. This formula forms the backbone of the Naive Bayes classifier.

The term 'naive' refers to the assumption that all variables in a dataset are independent of each other, which may not always be the case in real-life data. Nonetheless, it still offers robust performance and can be easily implemented.

Deriving the Naive Bayes Classifier Algorithm

In the context of machine learning, the Naive Bayes Classifier uses the Bayes theorem to compute the posterior probability of a class given a set of features and then classifies the outcome based on the highest posterior probability.

Assuming a binary class variable YY (binary means it can be equal to either 0 or 1) and features X1,X2,...,XnX_1, X_2, ..., X_n, our task is to compute the posterior probability P(Y=1X1=x1,X2=x2,...,Xn=xn)P(Y=1 | X_1=x_1, X_2=x_2,...,X_n=x_n). By shedding the denominator from Bayes' theorem (since it doesn't depend on YY and is constant for all classes), we are left with the task of maximizing the probability of YY and XX happening together P(Y,X)=P(XY)P(Y)P(Y, X) = P(X|Y)P(Y), which forms the basis for Naive Bayes classification.

Example: Calculating Prior Probabilities

Consider a dataset with the class Weather having Sunny (3 instances), Rainy (1 instance), and Snowy (3 instances).

TemperatureHumidityWeather
HotHighSunny
HotHighSunny
ColdNormalSnowy
HotNormalRainy
ColdHighSnowy
ColdNormalSnowy
HotNormalSunny

The prior probabilities are calculated as follows:

  • P(Sunny)=370.43P(\text{Sunny}) = \frac{3}{7} \approx 0.43
  • P(Rainy)=170.14P(\text{Rainy}) = \frac{1}{7} \approx 0.14
  • P(Snowy)=370.43P(\text{Snowy}) = \frac{3}{7} \approx 0.43
Example: Calculating Likelihoods

For features Temperature (Hot, Cold) and Humidity (High, Normal), calculate likelihoods for each class.

For Sunny:

  • P(Hot | Sunny)=230.67P(\text{Hot | Sunny}) = \frac{2}{3} \approx 0.67, P(Cold | Sunny)=130.33P(\text{Cold | Sunny}) = \frac{1}{3} \approx 0.33
  • P(High | Sunny)=230.67P(\text{High | Sunny}) = \frac{2}{3} \approx 0.67, P(Normal | Sunny)=130.33P(\text{Normal | Sunny}) = \frac{1}{3} \approx 0.33

For Rainy:

  • P(Hot | Rainy)=1.00P(\text{Hot | Rainy}) = 1.00, P(Cold | Rainy)=0.00P(\text{Cold | Rainy}) = 0.00
  • P(High | Rainy)=0.00P(\text{High | Rainy}) = 0.00, P(Normal | Rainy)=1.00P(\text{Normal | Rainy}) = 1.00

For Snowy:

  • P(Hot | Snowy)=0.00P(\text{Hot | Snowy}) = 0.00, P(Cold | Snowy)=1.00P(\text{Cold | Snowy}) = 1.00
  • P(High | Snowy)=130.33P(\text{High | Snowy}) = \frac{1}{3} \approx 0.33, P(Normal | Snowy)=230.67P(\text{Normal | Snowy}) = \frac{2}{3} \approx 0.67

These calculations illustrate how to derive prior probabilities and likelihoods for Naive Bayes Classification.

Implementing Naive Bayes Classifier

We approach the implementation of the Naive Bayes Classifier by first calculating the prior probabilities of each class, and then the likelihood of each feature given a class:

Python
1import pandas as pd 2 3def calculate_prior_probabilities(y): 4 # Calculate prior probabilities for each class 5 return y.value_counts(normalize=True) 6 7def calculate_likelihoods(X, y): 8 likelihoods = {} 9 for column in X.columns: 10 likelihoods[column] = {} 11 for class_ in y.unique(): 12 # Filter feature column data for each class 13 class_data = X[y == class_][column] 14 counts = class_data.value_counts() 15 total_count = len(class_data) # Total count of instances for current class 16 likelihoods[column][class_] = counts / total_count # Direct likelihoods without smoothing 17 return likelihoods

Armed with these utility functions, we can implement the Naive Bayes Classifier function:

Python
1def naive_bayes_classifier(X_test, priors, likelihoods): 2 predictions = [] 3 for _, data_point in X_test.iterrows(): 4 class_probabilities = {} 5 for class_ in priors.index: 6 class_probabilities[class_] = priors[class_] 7 for feature in X_test.columns: 8 # Use .get to safely retrieve probability and get a default of 1/total to handle unseen values 9 feature_probs = likelihoods[feature][class_] 10 class_probabilities[class_] *= feature_probs.get(data_point[feature], 1 / (len(feature_probs) + 1)) 11 12 # Predict class with maximum posterior probability 13 predictions.append(max(class_probabilities, key=class_probabilities.get)) 14 15 return predictions
Understanding and Handling Data Issues in Naive Bayes

A recurring challenge in Naive Bayes is the handling of zero probabilities, i.e., when a category does not appear in the training data for a given class, resulting in a zero probability for that category. A known fix for this problem is applying Laplace or Add-1 smoothing, which adds a '1' to each category count to circumvent zero probabilities.

You can integrate Laplace smoothing into the calculate_likelihoods function as follows:

Python
1def calculate_likelihoods_with_smoothing(X, y): 2 likelihoods = {} 3 for column in X.columns: 4 likelihoods[column] = {} 5 for class_ in y.unique(): 6 # Calculate normalized counts with smoothing 7 class_data = X[y == class_][column] 8 counts = class_data.value_counts() 9 total_count = len(class_data) + len(X[column].unique()) # total count with smoothing 10 likelihoods[column][class_] = (counts + 1) / total_count # add-1 smoothing 11 return likelihoods

The numerator is increased by 1 and the denominator by the count of unique categories to accommodate the added 1's.

Using Naive Bayes Classifier

Here is a short example of predicting weather with our classifier:

Python
1data = { 2 'Temperature': ['Hot', 'Hot', 'Cold', 'Hot', 'Cold', 'Cold', 'Hot'], 3 'Humidity': ['High', 'High', 'Normal', 'Normal', 'High', 'Normal', 'Normal'], 4 'Weather': ['Sunny', 'Sunny', 'Snowy', 'Rainy', 'Snowy', 'Snowy', 'Sunny'] 5} 6df = pd.DataFrame(data) 7 8# Split features and labels 9X = df[['Temperature', 'Humidity']] 10y = df['Weather'] 11 12# Calculate prior probabilities 13priors = calculate_prior_probabilities(y) 14 15# Calculate likelihoods with smoothing 16likelihoods = calculate_likelihoods_with_smoothing(X, y) 17 18# New observation 19X_test = pd.DataFrame([{'Temperature': 'Hot', 'Humidity': 'Normal'}]) 20 21# Make prediction 22prediction = naive_bayes_classifier(X_test, priors, likelihoods) 23print("Predicted Weather: ", prediction[0]) # Output: Predicted Weather: Sunny

The Naive Bayes Classifier predicts a class label based on the observed features. Owing to its simplicity, power, and speed, this classifier lends itself to challenging scenarios, including text classification, spam detection, and sentiment analysis.

Lesson Summary and Practice

Superb work! You've mastered the essentials of the Naive Bayes Classifier, from understanding its theory to crafting a Naive Bayes Classifier from scratch. The next phase is practice, which will consolidate your newly acquired skills. Enjoy the hands-on exercises lined up next. Delve deeper into your machine learning journey with the forthcoming lessons!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.