Hey there! Today we are going to explore an exciting topic in machine learning called Naive Bayes. By the end of this lesson, you'll understand what Naive Bayes
is and how to implement it using Python's Scikit-Learn
library. Let’s dive in!
Naive Bayes
is a classification algorithm based on Bayes' Theorem. Imagine you’re a detective using clues (features) to decide who the culprit is (class). Naive Bayes
helps by calculating probabilities.
Bayes' Theorem is stated as:
Where:
How Naive Bayes Works
93%
. This is what the prior probability is.Naive Bayes
updates its likelihoods and priors using the training data. When the model encounters new data, it breaks the data into its constituent features and applies Bayes' Theorem to calculate the class probabilities. The class with the highest probability is the predicted class.
We will focus on GaussianNB
, commonly used when features are continuous and assumed to follow a normal (Gaussian) distribution.
Before training our Naive Bayes
classifier, we need data. Consider it like needing mystery stories before solving them! We'll use the Iris dataset, which includes features of iris flowers to classify them into species.
Let’s quickly remind ourselves how to load the dataset using Scikit-Learn
:
Python1from sklearn.datasets import load_iris 2from sklearn.model_selection import train_test_split 3 4# Load the Iris dataset 5X, y = load_iris(return_X_y=True) 6 7# Split the dataset into training and testing sets 8X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)
Now that we have our data split, it’s time to train our Naive Bayes
classifier using GaussianNB
:
Python1from sklearn.naive_bayes import GaussianNB 2 3# Initialize the Naive Bayes classifier 4nb_clf = GaussianNB() 5 6# Train the classifier with training data 7nb_clf.fit(X_train, y_train)
Here, fit
trains the model using the training data, much like a student learning from textbooks.
After training the model, let’s make predictions on the test data and calculate the accuracy:
Python1from sklearn.metrics import accuracy_score 2 3# Make predictions on the testing set 4y_pred = nb_clf.predict(X_test) 5 6# Calculate accuracy 7accuracy = accuracy_score(y_test, y_pred) 8print(f"Bayes model accuracy: {accuracy * 100:.2f}%") 9# Bayes model accuracy: 96.67%
Here, y_pred
contains the predicted class labels for the test set, and accuracy_score
compares these predictions to the true labels (y_test
) to calculate the model's accuracy.
Great job! You've learned how to use the Naive Bayes
classifier for machine learning tasks. Here’s a quick recap:
Scikit-Learn
's load_iris
to load the Iris dataset.train_test_split
to split data into training and testing sets.GaussianNB
to train the Naive Bayes
classifier.Now it’s time to roll up your sleeves and get hands-on practice! In the next section, you'll implement what you’ve learned and see the Naive Bayes
classifier in action on your own. Excited? Let’s get started!