Ridge Regression

Lesson 3

Ridge Regression

Lesson Introduction

Hello! Today, we're going to talk about Ridge Regression. Ridge Regression is a special type of linear regression that helps when we have too many features (or variables) in our data. Imagine you have a lot of different ingredients for a recipe but don't know which ones are essential. Ridge Regression helps us decide which ingredients (or features) are important without overloading the recipe.

In this lesson, we'll learn:

What Ridge Regression is.
How to use Ridge Regression in Python.
How to interpret the results.
How Ridge Regression compares to regular linear regression.

Ready to dive in? Let's go!

What is Ridge Regression?

Ridge Regression is like normal linear regression but with a regularization term added. Why do we need this?

Think about building a sandcastle. If you pile up too much sand without structure, it might collapse. Similarly, in regression, too many variables can make our model too complex and perform poorly on new data. This is known as overfitting.

Ridge Regression helps by adding a "penalty" to the equation that keeps the coefficients (weights assigned to each feature) smaller. This penalty term is controlled by a parameter called $\alpha$ .

This penalty works by adding the sum of the squared values of the coefficients to the cost function. In mathematical terms, the Ridge Regression cost function is:

$J(\theta) = \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \alpha \sum_{j=1}^p \theta_j^2$

Here:

$J(\theta)$ is the cost function, which is a measure of how well the model's predictions match the actual data.
$y_i$ are the actual values.
$\hat{y}_i$ are the predicted values.
$\theta_j$ are the coefficients.
$\alpha$ is the regularization parameter.

The term $\alpha \sum_{j=1}^p \theta_j^2$ is the regularization term which penalizes large coefficients to reduce model complexity and prevent overfitting. The higher the value of $\alpha$ , the stronger the penalty on large coefficients.

Example of Ridge Regression: Part 1

Let's see Ridge Regression in action using Python and the Scikit-Learn library. We'll use a real dataset to demonstrate this.

First, load and split our dataset. We’ll use a diabetes dataset included in Scikit-Learn.

Python
1import numpy as np
2from sklearn.linear_model import Ridge, LinearRegression
3from sklearn.datasets import load_diabetes
4from sklearn.model_selection import train_test_split
5from sklearn.metrics import mean_squared_error
6
7# Load real dataset
8X, y = load_diabetes(return_X_y=True)
9
10# Splitting the dataset
11X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Here:

We import necessary libraries.
Load the diabetes dataset using load_diabetes().
Split this dataset into training and testing sets using train_test_split(), with 80% for training and 20% for testing.

Example of Ridge Regression: Part 2

Now, let's train our Ridge Regression model using the training data.

Python
1# Train a ridge regression model
2ridge_model = Ridge(alpha=0.35)
3ridge_model.fit(X_train, y_train)
4
5# Make predictions
6y_pred_ridge = ridge_model.predict(X_test)
7
8# Calculate Mean Squared Error
9mse_ridge = mean_squared_error(y_test, y_pred_ridge)
10print(f"Ridge Regression MSE: {mse_ridge}")
11# Ridge Regression MSE: 2878.4563201253923

Here:

We create a Ridge Regression model with $\alpha$ set to 0.35. This $\alpha$ value controls the strength of the regularization. Higher values mean stronger regularization.
We train (fit) the model using the fit() method with our training data (X_train and y_train).
Evaluate the model using Mean Squared Error (MSE).

Interpreting the Coefficients

Once trained, we can look at the coefficients (weights) and the intercept to understand the model better.

Python
1# Print the coefficients
2print(f"Coefficients: {ridge_model.coef_}, Intercept: {ridge_model.intercept_}")
3# Coefficients: [  44.97986989 -146.87318828  414.52388235  269.57882622  -42.27871117
4# -73.50772192 -182.81323752  136.63207571  316.39992559  106.88080884], Intercept: 151.75943045447815

Here:

We print the coefficients using ridge_model.coef_ and the intercept using ridge_model.intercept_.

As with a regular linear regression, coefficients show how much each feature contributes to the final prediction. The intercept is the value when all the features are zero.

Comparing Performance: Part 1

Ridge Regression is often better than regular linear regression when:

Multicollinearity: It handles highly correlated features by reducing the variance of coefficient estimates, leading to better generalization.
Overfitting: It prevents overfitting by adding regularization, improving model performance on new data.
High-Dimensional Data: It works well when the number of features is high relative to the number of observations, stabilizing coefficient estimates.

Let's compare the performance of the Regular Linear Regression model and the Ridge Regression model using their Mean Squared Error values. For this purposes, we will generate a highly correlated data, where the Ridge Regression is expected to be better:

Python
1import pandas as pd
2import numpy as np
3
4n_samples = 100
5X1 = np.random.rand(n_samples)
6X2 = X1 + np.random.normal(0, 0.05, n_samples)  # Higher correlation with smaller noise
7X3 = X1 + X2 + np.random.normal(0, 0.05, n_samples)  # Even higher correlation with smaller noise
8X4 = X1 + 2*X2 + 0.5*X3 + np.random.normal(0, 0.05, n_samples) 
9X5 = X2 + 3*X3 - 0.5*X4 + np.random.normal(0, 0.05, n_samples) 
10X = np.vstack([X1, X2, X3, X4, X5]).T
11
12# Step 2: Generate a target variable with more noise
13y = 3 * X1 + 5 * X2 + np.random.normal(0, 1.0, n_samples)  # Increased noise in y
14
15# Convert to DataFrame for easier display (optional)
16df = pd.DataFrame(X, columns=['X1', 'X2', 'X3', 'X4', 'X5'])
17df['y'] = y

Features $(x_2, ..., x_5)$ are the linear combinations of other features, which means the data is multicollinear.

Comparing Performance: Part 2

Now, let's compare the result of the Ridge Regression and the Linear Regression:

Python
1X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
2
3# Linear Regression
4lr = LinearRegression()
5lr.fit(X_train, y_train)
6y_pred_lr = lr.predict(X_test)
7mse_lr = mean_squared_error(y_test, y_pred_lr)
8print(f"Linear Regression MSE: {mse_lr:.4f}")
9
10# Ridge Regression
11ridge = Ridge(alpha=1.5)
12ridge.fit(X_train, y_train)
13y_pred_ridge = ridge.predict(X_test)
14mse_ridge = mean_squared_error(y_test, y_pred_ridge)
15print(f"Ridge Regression MSE: {mse_ridge:.4f}")

Here, we train both Ridge and LinearRegression models on the generated data and print their MSE scores. Here is the result:


1Linear Regression MSE: 1.1271
2Ridge Regression MSE: 1.0578

As you can see, in this case Ridge Regression outperforms the regular linear regression.

Lesson Summary

In this lesson, we learned about Ridge Regression—a special type of linear regression that helps prevent overfitting by adding a regularization term.

We walked through the steps to:

Load and split a dataset.
Train a regular linear regression model and a Ridge Regression model in Python using Scikit-Learn.
Evaluate both models using Mean Squared Error (MSE).
Compare the performance of both models.

Next, we’ll move to the practice section where you'll get hands-on experience implementing Ridge Regression on your own.

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.