Welcome to our exploration tour of the Naive Bayes Classifier! This robust classification algorithm is renowned for its simplicity and effectiveness. We will implement it from scratch in Python, allowing you to leverage its sheer power without the need for any prebuilt libraries. Let's get started!
Let's do a quick recall of probability theory.
usually denotes the likelihood of a certain event A occurring. , on the other hand, indicates the probability of event A taking place, assuming event B has already happened.
For instance, let's imagine there's a bag housing three marbles - one red and two blue. Denote A as the event where a red marble is picked, and B when a blue one is drawn.
The probability of A, , is 1/3
in this case.
Now, let's consider a scenario where a blue marble has been already drawn from the bag. This leaves us with one red and one blue marble in the bag. The probability of drawing a red marble (event A), given that a blue marble has already been extracted (event B), is denoted by . In this case, would be 1/2
, highlighting a higher likelihood of drawing a red marble following the initial removal of a blue one.
The Naive Bayes algorithms rely on the Bayes' theorem. Let's recall it quickly. This theorem calculates the probability of an event based on prior knowledge of potentially related events. It is represented mathematically as:
Where is the posterior probability of class () given predictor (). It's what we are trying to calculate. is the likelihood, which is the probability of the predictor given a class. is the marginal probability of predictor, and is the prior probability of the class. This formula forms the backbone of the Naive Bayes classifier.
The term 'naive' refers to the assumption that all variables in a dataset are independent of each other, which may not always be the case in real-life data. Nonetheless, it still offers robust performance and can be easily implemented.
In the context of machine learning, the Naive Bayes Classifier uses the Bayes theorem to compute the posterior probability of a class given a set of features and then classifies the outcome based on the highest posterior probability.
Assuming a binary class variable (binary means it can be equal to either 0
or 1
) and features , our task is to compute the posterior probability . By shedding the denominator from Bayes' theorem (since it doesn't depend on and is constant for all classes), we are left with the task of maximizing the probability of and happening together , which forms the basis for Naive Bayes classification.
Consider a dataset with the class Weather
having Sunny
(3 instances), Rainy
(1 instance), and Snowy
(3 instances).
Temperature | Humidity | Weather |
---|---|---|
Hot | High | Sunny |
Hot | High | Sunny |
Cold | Normal | Snowy |
Hot | Normal | Rainy |
Cold | High | Snowy |
Cold | Normal | Snowy |
Hot | Normal | Sunny |
The prior probabilities are calculated as follows:
For features Temperature
(Hot
, Cold
) and Humidity
(High
, Normal
), calculate likelihoods for each class.
For Sunny
:
- ,
- ,
For Rainy
:
- ,
- ,
For Snowy
:
- ,
- ,
These calculations illustrate how to derive prior probabilities and likelihoods for Naive Bayes Classification.
We approach the implementation of the Naive Bayes Classifier by first calculating the prior probabilities of each class, and then the likelihood of each feature given a class:
Python1import pandas as pd 2 3def calculate_prior_probabilities(y): 4 # Calculate prior probabilities for each class 5 return y.value_counts(normalize=True) 6 7def calculate_likelihoods(X, y): 8 likelihoods = {} 9 for column in X.columns: 10 likelihoods[column] = {} 11 for class_ in y.unique(): 12 # Filter feature column data for each class 13 class_data = X[y == class_][column] 14 counts = class_data.value_counts() 15 total_count = len(class_data) # Total count of instances for current class 16 likelihoods[column][class_] = counts / total_count # Direct likelihoods without smoothing 17 return likelihoods
Armed with these utility functions, we can implement the Naive Bayes Classifier function:
Python1def naive_bayes_classifier(X_test, priors, likelihoods): 2 predictions = [] 3 for _, data_point in X_test.iterrows(): 4 class_probabilities = {} 5 for class_ in priors.index: 6 class_probabilities[class_] = priors[class_] 7 for feature in X_test.columns: 8 # Use .get to safely retrieve probability and get a default of 1/total to handle unseen values 9 feature_probs = likelihoods[feature][class_] 10 class_probabilities[class_] *= feature_probs.get(data_point[feature], 1 / (len(feature_probs) + 1)) 11 12 # Predict class with maximum posterior probability 13 predictions.append(max(class_probabilities, key=class_probabilities.get)) 14 15 return predictions
A recurring challenge in Naive Bayes is the handling of zero probabilities, i.e., when a category does not appear in the training data for a given class, resulting in a zero probability for that category. A known fix for this problem is applying Laplace or Add-1 smoothing, which adds a '1' to each category count to circumvent zero probabilities.
You can integrate Laplace smoothing into the calculate_likelihoods
function as follows:
Python1def calculate_likelihoods_with_smoothing(X, y): 2 likelihoods = {} 3 for column in X.columns: 4 likelihoods[column] = {} 5 for class_ in y.unique(): 6 # Calculate normalized counts with smoothing 7 class_data = X[y == class_][column] 8 counts = class_data.value_counts() 9 total_count = len(class_data) + len(X[column].unique()) # total count with smoothing 10 likelihoods[column][class_] = (counts + 1) / total_count # add-1 smoothing 11 return likelihoods
The numerator is increased by 1 and the denominator by the count of unique categories to accommodate the added 1's.
Here is a short example of predicting weather with our classifier:
Python1data = { 2 'Temperature': ['Hot', 'Hot', 'Cold', 'Hot', 'Cold', 'Cold', 'Hot'], 3 'Humidity': ['High', 'High', 'Normal', 'Normal', 'High', 'Normal', 'Normal'], 4 'Weather': ['Sunny', 'Sunny', 'Snowy', 'Rainy', 'Snowy', 'Snowy', 'Sunny'] 5} 6df = pd.DataFrame(data) 7 8# Split features and labels 9X = df[['Temperature', 'Humidity']] 10y = df['Weather'] 11 12# Calculate prior probabilities 13priors = calculate_prior_probabilities(y) 14 15# Calculate likelihoods with smoothing 16likelihoods = calculate_likelihoods_with_smoothing(X, y) 17 18# New observation 19X_test = pd.DataFrame([{'Temperature': 'Hot', 'Humidity': 'Normal'}]) 20 21# Make prediction 22prediction = naive_bayes_classifier(X_test, priors, likelihoods) 23print("Predicted Weather: ", prediction[0]) # Output: Predicted Weather: Sunny
The Naive Bayes Classifier predicts a class label based on the observed features. Owing to its simplicity, power, and speed, this classifier lends itself to challenging scenarios, including text classification, spam detection, and sentiment analysis.
Superb work! You've mastered the essentials of the Naive Bayes Classifier, from understanding its theory to crafting a Naive Bayes Classifier from scratch. The next phase is practice, which will consolidate your newly acquired skills. Enjoy the hands-on exercises lined up next. Delve deeper into your machine learning journey with the forthcoming lessons!