Hey there! Today we are going to explore an exciting topic in machine learning called Naive Bayes. By the end of this lesson, you'll understand what Naive Bayes
is and how to implement it using Python's Scikit-Learn
library. Let’s dive in!
Naive Bayes
is a classification algorithm based on Bayes' Theorem. Imagine you’re a detective using clues (features) to decide who the culprit is (class). Naive Bayes
helps by calculating probabilities.
Bayes' Theorem is stated as:
Where:
- is the posterior probability of class given predictor .
- is the likelihood which is the probability of predictor given class .
- is the prior probability of class .
- is the prior probability of predictor .
How Naive Bayes Works
- Prior Probability: The algorithm starts by calculating the prior probability for each class based on the training data. It is simply the probability of a sample being of the class if we know no data about the sample. For example, imagine we predict if the email is spam or not. If the 93% of the emails in the data are not spam, then it is reasonable to suppose that a given email will be not spam with the probability of
93%
. This is what the prior probability is. - Likelihood: For each feature, the likelihood (probability of the feature given the class) is calculated. It is essentially the probability of a sample with a given feature to be of the given class.
- Independent Features Assumption (Naive Assumption): Assumes that the features are independent, which simplifies calculations.
- Posterior Probability: Using Bayes' Theorem, the posterior probability of each class is computed given the feature values. The class with the highest posterior probability is chosen as the prediction.
Naive Bayes
updates its likelihoods and priors using the training data. When the model encounters new data, it breaks the data into its constituent features and applies Bayes' Theorem to calculate the class probabilities. The class with the highest probability is the predicted class.
We will focus on GaussianNB
, commonly used when features are continuous and assumed to follow a normal (Gaussian) distribution.
Before training our Naive Bayes
classifier, we need data. Consider it like needing mystery stories before solving them! We'll use the Iris dataset, which includes features of iris flowers to classify them into species.
Let’s quickly remind ourselves how to load the dataset using Scikit-Learn
:
Python1from sklearn.datasets import load_iris 2from sklearn.model_selection import train_test_split 3 4# Load the Iris dataset 5X, y = load_iris(return_X_y=True) 6 7# Split the dataset into training and testing sets 8X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)
Now that we have our data split, it’s time to train our Naive Bayes
classifier using GaussianNB
:
Python1from sklearn.naive_bayes import GaussianNB 2 3# Initialize the Naive Bayes classifier 4nb_clf = GaussianNB() 5 6# Train the classifier with training data 7nb_clf.fit(X_train, y_train)
Here, fit
trains the model using the training data, much like a student learning from textbooks.
After training the model, let’s make predictions on the test data and calculate the accuracy:
Python1from sklearn.metrics import accuracy_score 2 3# Make predictions on the testing set 4y_pred = nb_clf.predict(X_test) 5 6# Calculate accuracy 7accuracy = accuracy_score(y_test, y_pred) 8print(f"Bayes model accuracy: {accuracy * 100:.2f}%") 9# Bayes model accuracy: 96.67%
Here, y_pred
contains the predicted class labels for the test set, and accuracy_score
compares these predictions to the true labels (y_test
) to calculate the model's accuracy.
Great job! You've learned how to use the Naive Bayes
classifier for machine learning tasks. Here’s a quick recap:
- Naive Bayes: A probabilistic classifier based on Bayes' Theorem.
- Dataset Loading: Used
Scikit-Learn
'sload_iris
to load the Iris dataset. - Train-Test Split: Used
train_test_split
to split data into training and testing sets. - Model Training: Used
GaussianNB
to train theNaive Bayes
classifier. - Making Predictions and Calculating Accuracy: Predicted test set labels and calculated the model's accuracy.
Now it’s time to roll up your sleeves and get hands-on practice! In the next section, you'll implement what you’ve learned and see the Naive Bayes
classifier in action on your own. Excited? Let’s get started!