Lesson 3
Random Search in Machine Learning
Lesson Introduction and Goals

Choosing the right parameters in machine learning models can greatly affect their success. Imagine these parameters as cake ingredients: the right amount makes your cake delicious. Similarly, the right parameter settings make your model accurate. Random Search helps find these “right ingredients” by trying random combinations. By the end of this lesson, you will:

  • Understand what Random Search is
  • Learn how to implement it using Scikit-Learn
  • Interpret the results to improve models
What is Random Search?

Random Search is a technique for tuning parameters by randomly sampling combinations from a given range, like randomly picking recipes to see which cake tastes best. Unlike Grid Search, which tries every possible combination, Random Search is faster because it tries random ones. It’s like flipping through a recipe book and picking random recipes instead of trying every single one.

Loading and Preparing the Dataset

We’ll use the wine dataset from Scikit-Learn. Let's load it and scale features:

Python
1from sklearn.datasets import load_wine 2from sklearn.preprocessing import StandardScaler 3 4# Load real dataset 5X, y = load_wine(return_X_y=True) 6X = StandardScaler().fit_transform(X)

To evaluate our model, we split the dataset into a training set (80%) and a testing set (20%).

Python
1from sklearn.model_selection import train_test_split 2 3# Splitting the dataset 4X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Defining the Parameter Distribution

A parameter grid is a set of parameters you want to try. For Logistic Regression, we’ll tune C and solver.

Python
1# Defining the parameter grid 2param_distributions = { 3 'C': [0.1, 0.5, 0.75, 1, 5, 10, 25, 50, 75, 100], 4 'solver': ['liblinear', 'saga'] 5}
  • C: Controls the strength of regularization. Smaller values specify stronger regularization.
  • solver: Algorithm used in the optimization problem.
Performing Random Search

RandomizedSearchCV is a Scikit-Learn tool for Random Search. It randomly selects parameter combinations and evaluates their performance.

  • n_iter: Number of settings sampled.
  • cv: Number of cross-validation splits.
Python
1from sklearn.model_selection import RandomizedSearchCV 2from sklearn.linear_model import LogisticRegression 3 4# Performing randomized search 5random_search = RandomizedSearchCV(LogisticRegression(max_iter=1000), param_distributions, n_iter=10, cv=5, random_state=42) 6random_search.fit(X_train, y_train)
Interpreting the Results

After running the search, find the best parameters and view the best score achieved during cross-validation.

Python
1print(f"Best parameters: {random_search.best_params_}") 2print(f"Best cross-validation score: {random_search.best_score_}") 3# Best parameters: {'solver': 'liblinear', 'C': 5} 4# Best cross-validation score: 0.992
Calculating the Final Metric on the Testing Dataset

After identifying the best parameters from the Random Search, it’s crucial to evaluate the model on the testing dataset to see how well it generalizes to new, unseen data.

Python
1from sklearn.metrics import accuracy_score 2 3# Best model with best parameters from random search 4best_model = random_search.best_estimator_ 5 6# Predicting on the testing set 7y_pred = best_model.predict(X_test) 8 9# Calculating the accuracy on the testing set 10test_accuracy = accuracy_score(y_test, y_pred) 11print(f"Test Accuracy: {test_accuracy}") 12# Test Accuracy: 0.981

In this example, the accuracy on the testing set is calculated using the best model obtained from RandomizedSearchCV. This final evaluation metric gives an indication of the model's performance on new data.

Lesson Summary and Practice Introduction

In this lesson, you learned:

  • What Random Search is
  • How to load and split a dataset
  • How to define parameter ranges
  • Implementing Random Search with RandomizedSearchCV
  • Interpreting the best parameters and scores

Now it’s your turn to practice! Apply Random Search to different models and datasets. This will help solidify your understanding. Let’s move on to the practice session!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.