Lesson 3

Hello! Today, we're going to talk about **Ridge Regression**. Ridge Regression is a special type of linear regression that helps when we have too many features (or variables) in our data. Imagine you have a lot of different ingredients for a recipe but don't know which ones are essential. Ridge Regression helps us decide which ingredients (or features) are important without overloading the recipe.

In this lesson, we'll learn:

- What Ridge Regression is.
- How to use Ridge Regression in Python.
- How to interpret the results.
- How Ridge Regression compares to regular linear regression.

Ready to dive in? Let's go!

Ridge Regression is like normal linear regression but with a regularization term added. Why do we need this?

Think about building a sandcastle. If you pile up too much sand without structure, it might collapse. Similarly, in regression, too many variables can make our model too complex and perform poorly on new data. This is known as overfitting.

Ridge Regression helps by adding a "penalty" to the equation that keeps the coefficients (weights assigned to each feature) smaller. This penalty term is controlled by a parameter called $\alpha$.

This penalty works by adding the sum of the squared values of the coefficients to the cost function. In mathematical terms, the Ridge Regression cost function is:

$J(\theta) = \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \alpha \sum_{j=1}^p \theta_j^2$

Here:

- $J(\theta)$ is the cost function, which is a measure of how well the model's predictions match the actual data.
- $y_i$ are the actual values.
- $\hat{y}_i$ are the predicted values.
- $\theta_j$ are the coefficients.
- $\alpha$ is the regularization parameter.

The term $\alpha \sum_{j=1}^p \theta_j^2$ is the regularization term which penalizes large coefficients to reduce model complexity and prevent overfitting. The higher the value of $\alpha$, the stronger the penalty on large coefficients.

Let's see Ridge Regression in action using Python and the **Scikit-Learn** library. We'll use a real dataset to demonstrate this.

First, load and split our dataset. We’ll use a diabetes dataset included in **Scikit-Learn**.

Python`1import numpy as np 2from sklearn.linear_model import Ridge, LinearRegression 3from sklearn.datasets import load_diabetes 4from sklearn.model_selection import train_test_split 5from sklearn.metrics import mean_squared_error 6 7# Load real dataset 8X, y = load_diabetes(return_X_y=True) 9 10# Splitting the dataset 11X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)`

Here:

- We import necessary libraries.
- Load the diabetes dataset using
`load_diabetes()`

. - Split this dataset into training and testing sets using
`train_test_split()`

, with 80% for training and 20% for testing.

Now, let's train our Ridge Regression model using the training data.

Python`1# Train a ridge regression model 2ridge_model = Ridge(alpha=0.35) 3ridge_model.fit(X_train, y_train) 4 5# Make predictions 6y_pred_ridge = ridge_model.predict(X_test) 7 8# Calculate Mean Squared Error 9mse_ridge = mean_squared_error(y_test, y_pred_ridge) 10print(f"Ridge Regression MSE: {mse_ridge}") 11# Ridge Regression MSE: 2878.4563201253923`

Here:

- We create a Ridge Regression model with $\alpha$ set to 0.35. This $\alpha$ value controls the strength of the regularization. Higher values mean stronger regularization.
- We train (fit) the model using the
`fit()`

method with our training data (`X_train`

and`y_train`

). - Evaluate the model using Mean Squared Error (MSE).

Once trained, we can look at the coefficients (weights) and the intercept to understand the model better.

Python`1# Print the coefficients 2print(f"Coefficients: {ridge_model.coef_}, Intercept: {ridge_model.intercept_}") 3# Coefficients: [ 44.97986989 -146.87318828 414.52388235 269.57882622 -42.27871117 4# -73.50772192 -182.81323752 136.63207571 316.39992559 106.88080884], Intercept: 151.75943045447815`

Here:

- We print the coefficients using
`ridge_model.coef_`

and the intercept using`ridge_model.intercept_`

.

As with a regular linear regression, coefficients show how much each feature contributes to the final prediction. The intercept is the value when all the features are zero.

Ridge Regression is often better than regular linear regression when:

**Multicollinearity**: It handles highly correlated features by reducing the variance of coefficient estimates, leading to better generalization.**Overfitting**: It prevents overfitting by adding regularization, improving model performance on new data.**High-Dimensional Data**: It works well when the number of features is high relative to the number of observations, stabilizing coefficient estimates.

Let's compare the performance of the Regular Linear Regression model and the Ridge Regression model using their Mean Squared Error values. For this purposes, we will generate a highly correlated data, where the Ridge Regression is expected to be better:

Python`1import pandas as pd 2import numpy as np 3 4n_samples = 100 5X1 = np.random.rand(n_samples) 6X2 = X1 + np.random.normal(0, 0.05, n_samples) # Higher correlation with smaller noise 7X3 = X1 + X2 + np.random.normal(0, 0.05, n_samples) # Even higher correlation with smaller noise 8X4 = X1 + 2*X2 + 0.5*X3 + np.random.normal(0, 0.05, n_samples) 9X5 = X2 + 3*X3 - 0.5*X4 + np.random.normal(0, 0.05, n_samples) 10X = np.vstack([X1, X2, X3, X4, X5]).T 11 12# Step 2: Generate a target variable with more noise 13y = 3 * X1 + 5 * X2 + np.random.normal(0, 1.0, n_samples) # Increased noise in y 14 15# Convert to DataFrame for easier display (optional) 16df = pd.DataFrame(X, columns=['X1', 'X2', 'X3', 'X4', 'X5']) 17df['y'] = y`

Features $(x_2, ..., x_5)$ are the linear combinations of other features, which means the data is multicollinear.

Now, let's compare the result of the Ridge Regression and the Linear Regression:

Python`1X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 2 3# Linear Regression 4lr = LinearRegression() 5lr.fit(X_train, y_train) 6y_pred_lr = lr.predict(X_test) 7mse_lr = mean_squared_error(y_test, y_pred_lr) 8print(f"Linear Regression MSE: {mse_lr:.4f}") 9 10# Ridge Regression 11ridge = Ridge(alpha=1.5) 12ridge.fit(X_train, y_train) 13y_pred_ridge = ridge.predict(X_test) 14mse_ridge = mean_squared_error(y_test, y_pred_ridge) 15print(f"Ridge Regression MSE: {mse_ridge:.4f}")`

Here, we train both `Ridge`

and `LinearRegression`

models on the generated data and print their MSE scores. Here is the result:

`1Linear Regression MSE: 1.1271 2Ridge Regression MSE: 1.0578`

As you can see, in this case Ridge Regression outperforms the regular linear regression.

In this lesson, we learned about Ridge Regression—a special type of linear regression that helps prevent overfitting by adding a regularization term.

We walked through the steps to:

- Load and split a dataset.
- Train a regular linear regression model and a Ridge Regression model in Python using Scikit-Learn.
- Evaluate both models using Mean Squared Error (MSE).
- Compare the performance of both models.

Next, we’ll move to the practice section where you'll get hands-on experience implementing Ridge Regression on your own.