Lesson 2

Welcome! Today, we're going to learn about an exciting and powerful tool in machine learning called **Grid Search**. Imagine trying to find the perfect pair of shoes that fit just right. `Grid Search`

does something similar but for tuning machine learning models. By the end of this lesson, you'll understand how to use `Grid Search`

to find the best settings (parameters) for your models.

Imagine baking the perfect cake. You need to find the right proportions of sugar, flour, and baking soda. `Grid Search`

does the same for machine learning models by trying different combinations of parameters to find the best one. **Parameters** are settings you can adjust to improve your model's performance. The right parameters can make your model more accurate.

The parameters we set when initializing the model are called hyperparameters. Finding the perfect combination of them is called hypertuning.

We have already done some hypertuning before in this course path using `for`

loops. But writing a for loop each time can be laborious, especially if you must check multiple models with multiple hyperparameters each. So, it is time for us to learn about a special tool that automates this process!

Let's implement `Grid Search`

using **Scikit-Learn**.

First, load the libraries and the **Wine** dataset:

Python`1from sklearn.model_selection import GridSearchCV 2from sklearn.tree import DecisionTreeClassifier 3from sklearn.datasets import load_wine 4from sklearn.model_selection import train_test_split 5 6# Load real dataset 7X, y = load_wine(return_X_y=True) 8# Splitting the dataset 9X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)`

Here, we're loading a dataset about different wine types. It contains data to help us classify different wine categories based on their attributes. We also split our data into a training set and a test set.

Next, let's define which parameters to test, similar to adjusting ingredients in a recipe. For `DecisionTreeClassifier`

, try different values of `max_depth`

(the maximum depth of the tree) and `min_samples_split`

(the minimum number of samples required to split an internal node).

The Grid Search requires a parameter grid, defined as a dictionary, where keys are the model's hyperparameters, and the values are the lists of possible options. Let's define it:

Python`1# Defining the parameter grid 2param_grid = { 3 'max_depth': [3, 5, 7, 10], 4 'min_samples_split': [2, 5, 10] 5}`

Here, we say that `max_depth`

can be 3, 5, 7, or 10, and `min_samples_split`

can be 2, 5, or 10.

The `GridSearchCV`

class tests all parameter combinations and uses 5-fold cross-validation (`cv=5`

). We'll also specify the `scoring`

parameter to use accuracy as the evaluation metric.

Python`1# Performing grid search 2grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5, scoring='accuracy') 3grid_search.fit(X_train, y_train)`

We're not just training one model but several models with different parameter combinations. The `fit`

function handles this: the last code line fits our `DecisionTreeClassifier`

model with the parameter grid. `Grid Search`

tests all combinations and selects the best model.

After `Grid Search`

, check the best parameters and the model's performance.

Python`1print(f"Best parameters: {grid_search.best_params_}") 2print(f"Best cross-validation score: {grid_search.best_score_}") 3# Best parameters: {'max_depth': 3, 'min_samples_split': 2} 4# Best cross-validation score: 0.9224137931034484`

`grid_search.best_params_`

shows the best parameter combination, and `grid_search.best_score_`

provides the best cross-validation score. This tells us what was the best score of the best parameter combination.

After finding the best parameters, use the best estimator to predict on the testing set and calculate the accuracy.

Python`1from sklearn.metrics import accuracy_score 2 3# Making predictions on the testing set 4best_model = grid_search.best_estimator_ 5y_pred = best_model.predict(X_test) 6 7# Calculating the accuracy on the testing set 8test_accuracy = accuracy_score(y_test, y_pred) 9print(f"Test set accuracy: {test_accuracy}") 10# Test set accuracy: 0.9444444444444444`

We make predictions on the test set using the model with the best parameters found by `Grid Search`

, and then calculate the accuracy of these predictions.

We've made great progress! Here's a quick summary:

**What is Grid Search?**It's a method to find the best parameters for your machine learning model.**Why use it?**Because the right parameters can make your model more accurate.**How to use it with Scikit-Learn?**Load a real dataset, define a parameter grid, split the dataset, perform`Grid Search`

, train the model, evaluate the results, make predictions, and calculate the final accuracy.

Now, you're ready to move on to some practice exercises. You'll apply `Grid Search`

to find the best parameters for your own machine learning models. Let's get started with the practice session!