Choosing the right parameters in machine learning models can greatly affect their success. Imagine these parameters as cake ingredients: the right amount makes your cake delicious. Similarly, the right parameter settings make your model accurate. Random Search helps find these “right ingredients” by trying random combinations. By the end of this lesson, you will:
- Understand what Random Search is
- Learn how to implement it using
Scikit-Learn
- Interpret the results to improve models
Random Search is a technique for tuning parameters by randomly sampling combinations from a given range, like randomly picking recipes to see which cake tastes best. Unlike Grid Search, which tries every possible combination, Random Search is faster because it tries random ones. It’s like flipping through a recipe book and picking random recipes instead of trying every single one.
We’ll use the wine dataset from Scikit-Learn
. Let's load it and scale features:
Python1from sklearn.datasets import load_wine 2from sklearn.preprocessing import StandardScaler 3 4# Load real dataset 5X, y = load_wine(return_X_y=True) 6X = StandardScaler().fit_transform(X)
To evaluate our model, we split the dataset into a training set (80%) and a testing set (20%).
Python1from sklearn.model_selection import train_test_split 2 3# Splitting the dataset 4X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
A parameter grid is a set of parameters you want to try. For Logistic Regression, we’ll tune C
and solver
.
Python1# Defining the parameter grid 2param_distributions = { 3 'C': [0.1, 0.5, 0.75, 1, 5, 10, 25, 50, 75, 100], 4 'solver': ['liblinear', 'saga'] 5}
- C: Controls the strength of regularization. Smaller values specify stronger regularization.
- solver: Algorithm used in the optimization problem.
RandomizedSearchCV
is a Scikit-Learn
tool for Random Search. It randomly selects parameter combinations and evaluates their performance.
- n_iter: Number of settings sampled.
- cv: Number of cross-validation splits.
Python1from sklearn.model_selection import RandomizedSearchCV 2from sklearn.linear_model import LogisticRegression 3 4# Performing randomized search 5random_search = RandomizedSearchCV(LogisticRegression(max_iter=1000), param_distributions, n_iter=10, cv=5, random_state=42) 6random_search.fit(X_train, y_train)
After running the search, find the best parameters and view the best score achieved during cross-validation.
Python1print(f"Best parameters: {random_search.best_params_}") 2print(f"Best cross-validation score: {random_search.best_score_}") 3# Best parameters: {'solver': 'liblinear', 'C': 5} 4# Best cross-validation score: 0.992
After identifying the best parameters from the Random Search, it’s crucial to evaluate the model on the testing dataset to see how well it generalizes to new, unseen data.
Python1from sklearn.metrics import accuracy_score 2 3# Best model with best parameters from random search 4best_model = random_search.best_estimator_ 5 6# Predicting on the testing set 7y_pred = best_model.predict(X_test) 8 9# Calculating the accuracy on the testing set 10test_accuracy = accuracy_score(y_test, y_pred) 11print(f"Test Accuracy: {test_accuracy}") 12# Test Accuracy: 0.981
In this example, the accuracy on the testing set is calculated using the best model obtained from RandomizedSearchCV
. This final evaluation metric gives an indication of the model's performance on new data.
In this lesson, you learned:
- What Random Search is
- How to load and split a dataset
- How to define parameter ranges
- Implementing Random Search with
RandomizedSearchCV
- Interpreting the best parameters and scores
Now it’s your turn to practice! Apply Random Search to different models and datasets. This will help solidify your understanding. Let’s move on to the practice session!