Lesson 4

Welcome to our lesson on **regularization**, a pivotal concept in machine learning. Regularization is a technique used to prevent **overfitting**, a common issue that arises when our model learns too much detail from the training data and performs poorly on unseen data. In this lesson, we will focus on learning and applying L1 and L2 regularization techniques to Logistic Regression and Decision Tree models.

In this section, we'll explore how to tackle overfitting through regularization. Overfitting is like memorizing the answers to a test rather than understanding the subject. It happens when a model learns the training data too well, including its noise and outliers, which hampers its performance on new, unseen data. Regularization helps to prevent this by simplifying the model in a controlled way.

There are two main types of regularization techniques we will focus on: `L1 (Lasso)`

and `L2 (Ridge)`

regularization. Both methods add a penalty to the model, but they do so in different ways, leading to different outcomes.

Imagine you're painting a picture but decide to use only the essential colors. This is what `L1`

regularization does. It simplifies the model by forcing some feature weights to be exactly zero, effectively removing those features from the model. This can lead to a model that's easier to interpret and less prone to overfitting. In technical terms, `L1`

adds a penalty equal to the absolute value of the magnitude of coefficients.

Now, think of tuning a musical instrument to ensure no single note overpowers the others. `L2`

regularization works similarly. It reduces the model's complexity by penalizing large coefficients but doesn't zero them out. This method is useful when many features contribute small effects, and you don't want to eliminate them entirely. `L2`

adds a penalty equal to the square of the magnitude of coefficients.

Here is a chart that illustrates these concepts. In the chart, you can see how `L1`

regularization can completely remove some features (by setting their importance to zero), while `L2`

regularization uniformly reduces the importance of all features.

Regularization is a powerful tool in machine learning, striking a balance between simplicity and predictive power in models. By applying `L1`

or `L2`

regularization, we can create models that generalize better to new data, avoiding the pitfalls of overfitting.

Now, let's see how we can apply these regularization techniques in different types of models like Logistic Regression and Decision Trees.

Now, let's apply L1 and L2 regularization in the context of **Logistic Regression** using the popular Python library, `Sklearn`

. For this, we'll use the same **Breast Cancer Wisconsin Dataset** we've been using throughout this course. After loading the data and splitting it into a training set and a test set, we'll use `Sklearn's LogisticRegression()`

class, which has a `penalty`

parameter for applying regularization.

Let's see it in action:

Python`1from sklearn.datasets import load_breast_cancer 2from sklearn.model_selection import train_test_split 3from sklearn.linear_model import LogisticRegression 4 5# Loading dataset 6data = load_breast_cancer() 7 8# Splitting the dataset into train and test sets 9X_train, X_test, Y_train, Y_test = train_test_split(data.data, data.target, test_size=0.3, random_state=42) 10 11# Applying L1 regularization 12logistic_l1 = LogisticRegression(penalty='l1', C=0.1, solver='liblinear', max_iter = 10000) 13logistic_l1.fit(X_train, Y_train) 14# Solver 'liblinear' supports both L1 and L2 regularization 15 16# Applying L2 regularization 17logistic_l2 = LogisticRegression(penalty='l2', C=0.1, solver='liblinear', max_iter = 10000) 18logistic_l2.fit(X_train, Y_train) 19# The same solver as above, the only difference is the penalty.`

The hyperparameter `'C'`

operates as the inverse of the regularization strength. Smaller values indicate stronger regularization.

After fitting the Logistic Regression models with L1 and L2 regularization, it's important to evaluate their performance. We'll do this by computing the accuracy of each model on the test set. The accuracy score is a straightforward way to measure how often the model predicts correctly. In Sklearn, this can be done using the `accuracy_score`

function from the `metrics`

module. Let's calculate and compare the accuracies of our regularized models:

Python`1from sklearn.metrics import accuracy_score 2 3# Predicting the test set results for L1 regularized model 4Y_pred_l1 = logistic_l1.predict(X_test) 5accuracy_l1 = accuracy_score(Y_test, Y_pred_l1) 6print(f"Accuracy of L1 Regularized Model: {accuracy_l1:.2f}") 7 8# Predicting the test set results for L2 regularized model 9Y_pred_l2 = logistic_l2.predict(X_test) 10accuracy_l2 = accuracy_score(Y_test, Y_pred_l2) 11print(f"Accuracy of L2 Regularized Model: {accuracy_l2:.2f}")`

output:

`1Accuracy of L1 Regularized Model: 0.96 2Accuracy of L2 Regularized Model: 0.98`

We can see that L2 regularization is working better in this case than L1.

In machine learning, choosing between L1 (Lasso) and L2 (Ridge) regularization depends on your model and data. L1 is beneficial for models with numerous features, as it helps in feature selection by shrinking some coefficients to zero. This is particularly useful in Logistic and Linear Regression when you want a simpler, more interpretable model. On the other hand, L2 regularization, which reduces the impact of all features more uniformly without eliminating them, is suitable when dealing with correlated features. It's commonly used in Logistic Regression, Linear Regression, and Neural Networks to prevent overfitting, especially when the dataset has fewer samples than features.

Importantly, regularization techniques like L1 and L2 are not used in models such as Decision Trees. These models have their own methods of controlling complexity and preventing overfitting, like tree depth and pruning, making external regularization unnecessary.

Great job for making it through this lesson! You've learned a fundamental technique that will help prevent your machine learning models from overfitting to your training data. Importantly, you can now model with confidence, knowing that you're equipped to reduce the risk of overfitting by carefully applying L1 and L2 regularization techniques to Logistic Regressions and other models.

Now, it's time to cement these concepts into your practice. Up next, we have some hands-on exercises designed to help you apply what you've just learned. It's time to level up your machine learning models, and remember - practice makes perfect!