Lesson 3

Choosing the right parameters in **machine learning models** can greatly affect their success. Imagine these parameters as cake ingredients: the right amount makes your cake delicious. Similarly, the right parameter settings make your model accurate. **Random Search** helps find these “right ingredients” by trying random combinations. By the end of this lesson, you will:

- Understand what Random Search is
- Learn how to implement it using
`Scikit-Learn`

- Interpret the results to improve models

**Random Search** is a technique for tuning parameters by randomly sampling combinations from a given range, like randomly picking recipes to see which cake tastes best. Unlike Grid Search, which tries every possible combination, Random Search is faster because it tries random ones. It’s like flipping through a recipe book and picking random recipes instead of trying every single one.

We’ll use the *wine dataset* from `Scikit-Learn`

. Let's load it and scale features:

Python`1from sklearn.datasets import load_wine 2from sklearn.preprocessing import StandardScaler 3 4# Load real dataset 5X, y = load_wine(return_X_y=True) 6X = StandardScaler().fit_transform(X)`

To evaluate our model, we split the dataset into a training set (80%) and a testing set (20%).

Python`1from sklearn.model_selection import train_test_split 2 3# Splitting the dataset 4X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)`

A **parameter grid** is a set of parameters you want to try. For *Logistic Regression*, we’ll tune `C`

and `solver`

.

Python`1# Defining the parameter grid 2param_distributions = { 3 'C': [0.1, 0.5, 0.75, 1, 5, 10, 25, 50, 75, 100], 4 'solver': ['liblinear', 'saga'] 5}`

**C**: Controls the strength of regularization. Smaller values specify stronger regularization.**solver**: Algorithm used in the optimization problem.

`RandomizedSearchCV`

is a `Scikit-Learn`

tool for Random Search. It randomly selects parameter combinations and evaluates their performance.

**n_iter**: Number of settings sampled.**cv**: Number of cross-validation splits.

Python`1from sklearn.model_selection import RandomizedSearchCV 2from sklearn.linear_model import LogisticRegression 3 4# Performing randomized search 5random_search = RandomizedSearchCV(LogisticRegression(max_iter=1000), param_distributions, n_iter=10, cv=5, random_state=42) 6random_search.fit(X_train, y_train)`

After running the search, find the best parameters and view the best score achieved during cross-validation.

Python`1print(f"Best parameters: {random_search.best_params_}") 2print(f"Best cross-validation score: {random_search.best_score_}") 3# Best parameters: {'solver': 'liblinear', 'C': 5} 4# Best cross-validation score: 0.992`

After identifying the best parameters from the Random Search, it’s crucial to evaluate the model on the testing dataset to see how well it generalizes to new, unseen data.

Python`1from sklearn.metrics import accuracy_score 2 3# Best model with best parameters from random search 4best_model = random_search.best_estimator_ 5 6# Predicting on the testing set 7y_pred = best_model.predict(X_test) 8 9# Calculating the accuracy on the testing set 10test_accuracy = accuracy_score(y_test, y_pred) 11print(f"Test Accuracy: {test_accuracy}") 12# Test Accuracy: 0.981`

In this example, the accuracy on the testing set is calculated using the best model obtained from `RandomizedSearchCV`

. This final evaluation metric gives an indication of the model's performance on new data.

In this lesson, you learned:

- What
**Random Search**is - How to load and split a dataset
- How to define parameter ranges
- Implementing Random Search with
`RandomizedSearchCV`

- Interpreting the best parameters and scores

Now it’s your turn to practice! Apply Random Search to different models and datasets. This will help solidify your understanding. Let’s move on to the practice session!