Welcome! Today, we're going to learn about an exciting and powerful tool in machine learning called Grid Search. Imagine trying to find the perfect pair of shoes that fit just right. Grid Search
does something similar but for tuning machine learning models. By the end of this lesson, you'll understand how to use Grid Search
to find the best settings (parameters) for your models.
Imagine baking the perfect cake. You need to find the right proportions of sugar, flour, and baking soda. Grid Search
does the same for machine learning models by trying different combinations of parameters to find the best one. Parameters are settings you can adjust to improve your model's performance. The right parameters can make your model more accurate.
The parameters we set when initializing the model are called hyperparameters. Finding the perfect combination of them is called hypertuning.
We have already done some hypertuning before in this course path using for
loops. But writing a for loop each time can be laborious, especially if you must check multiple models with multiple hyperparameters each. So, it is time for us to learn about a special tool that automates this process!
Let's implement Grid Search
using Scikit-Learn.
First, load the libraries and the Wine dataset:
Python1from sklearn.model_selection import GridSearchCV 2from sklearn.tree import DecisionTreeClassifier 3from sklearn.datasets import load_wine 4from sklearn.model_selection import train_test_split 5 6# Load real dataset 7X, y = load_wine(return_X_y=True) 8# Splitting the dataset 9X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Here, we're loading a dataset about different wine types. It contains data to help us classify different wine categories based on their attributes. We also split our data into a training set and a test set.
Next, let's define which parameters to test, similar to adjusting ingredients in a recipe. For DecisionTreeClassifier
, try different values of max_depth
(the maximum depth of the tree) and min_samples_split
(the minimum number of samples required to split an internal node).
The Grid Search requires a parameter grid, defined as a dictionary, where keys are the model's hyperparameters, and the values are the lists of possible options. Let's define it:
Python1# Defining the parameter grid 2param_grid = { 3 'max_depth': [3, 5, 7, 10], 4 'min_samples_split': [2, 5, 10] 5}
Here, we say that max_depth
can be 3, 5, 7, or 10, and min_samples_split
can be 2, 5, or 10.
The GridSearchCV
class tests all parameter combinations and uses 5-fold cross-validation (cv=5
). We'll also specify the scoring
parameter to use accuracy as the evaluation metric.
Python1# Performing grid search 2grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5, scoring='accuracy') 3grid_search.fit(X_train, y_train)
We're not just training one model but several models with different parameter combinations. The fit
function handles this: the last code line fits our DecisionTreeClassifier
model with the parameter grid. Grid Search
tests all combinations and selects the best model.
After Grid Search
, check the best parameters and the model's performance.
Python1print(f"Best parameters: {grid_search.best_params_}") 2print(f"Best cross-validation score: {grid_search.best_score_}") 3# Best parameters: {'max_depth': 3, 'min_samples_split': 2} 4# Best cross-validation score: 0.9224137931034484
grid_search.best_params_
shows the best parameter combination, and grid_search.best_score_
provides the best cross-validation score. This tells us what was the best score of the best parameter combination.
After finding the best parameters, use the best estimator to predict on the testing set and calculate the accuracy.
Python1from sklearn.metrics import accuracy_score 2 3# Making predictions on the testing set 4best_model = grid_search.best_estimator_ 5y_pred = best_model.predict(X_test) 6 7# Calculating the accuracy on the testing set 8test_accuracy = accuracy_score(y_test, y_pred) 9print(f"Test set accuracy: {test_accuracy}") 10# Test set accuracy: 0.9444444444444444
We make predictions on the test set using the model with the best parameters found by Grid Search
, and then calculate the accuracy of these predictions.
We've made great progress! Here's a quick summary:
- What is Grid Search? It's a method to find the best parameters for your machine learning model.
- Why use it? Because the right parameters can make your model more accurate.
- How to use it with Scikit-Learn? Load a real dataset, define a parameter grid, split the dataset, perform
Grid Search
, train the model, evaluate the results, make predictions, and calculate the final accuracy.
Now, you're ready to move on to some practice exercises. You'll apply Grid Search
to find the best parameters for your own machine learning models. Let's get started with the practice session!