Hello! Today, we're going to talk about Ridge Regression. Ridge Regression is a special type of linear regression that helps when we have too many features (or variables) in our data. Imagine you have a lot of different ingredients for a recipe but don't know which ones are essential. Ridge Regression helps us decide which ingredients (or features) are important without overloading the recipe.
In this lesson, we'll learn:
Ready to dive in? Let's go!
Ridge Regression is like normal linear regression but with a regularization term added. Why do we need this?
Think about building a sandcastle. If you pile up too much sand without structure, it might collapse. Similarly, in regression, too many variables can make our model too complex and perform poorly on new data. This is known as overfitting.
Ridge Regression helps by adding a "penalty" to the equation that keeps the coefficients (weights assigned to each feature) smaller. This penalty term is controlled by a parameter called .
This penalty works by adding the sum of the squared values of the coefficients to the cost function. In mathematical terms, the Ridge Regression cost function is:
Here:
The term is the regularization term which penalizes large coefficients to reduce model complexity and prevent overfitting. The higher the value of , the stronger the penalty on large coefficients.
Let's see Ridge Regression in action using Python and the Scikit-Learn library. We'll use a real dataset to demonstrate this.
First, load and split our dataset. We’ll use a diabetes dataset included in Scikit-Learn.
Python1import numpy as np 2from sklearn.linear_model import Ridge, LinearRegression 3from sklearn.datasets import load_diabetes 4from sklearn.model_selection import train_test_split 5from sklearn.metrics import mean_squared_error 6 7# Load real dataset 8X, y = load_diabetes(return_X_y=True) 9 10# Splitting the dataset 11X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Here:
load_diabetes()
.train_test_split()
, with 80% for training and 20% for testing.Now, let's train our Ridge Regression model using the training data.
Python1# Train a ridge regression model 2ridge_model = Ridge(alpha=0.35) 3ridge_model.fit(X_train, y_train) 4 5# Make predictions 6y_pred_ridge = ridge_model.predict(X_test) 7 8# Calculate Mean Squared Error 9mse_ridge = mean_squared_error(y_test, y_pred_ridge) 10print(f"Ridge Regression MSE: {mse_ridge}") 11# Ridge Regression MSE: 2878.4563201253923
Here:
fit()
method with our training data (X_train
and y_train
).Once trained, we can look at the coefficients (weights) and the intercept to understand the model better.
Python1# Print the coefficients 2print(f"Coefficients: {ridge_model.coef_}, Intercept: {ridge_model.intercept_}") 3# Coefficients: [ 44.97986989 -146.87318828 414.52388235 269.57882622 -42.27871117 4# -73.50772192 -182.81323752 136.63207571 316.39992559 106.88080884], Intercept: 151.75943045447815
Here:
ridge_model.coef_
and the intercept using ridge_model.intercept_
.As with a regular linear regression, coefficients show how much each feature contributes to the final prediction. The intercept is the value when all the features are zero.
Ridge Regression is often better than regular linear regression when:
Let's compare the performance of the Regular Linear Regression model and the Ridge Regression model using their Mean Squared Error values. For this purposes, we will generate a highly correlated data, where the Ridge Regression is expected to be better:
Python1import pandas as pd 2import numpy as np 3 4n_samples = 100 5X1 = np.random.rand(n_samples) 6X2 = X1 + np.random.normal(0, 0.05, n_samples) # Higher correlation with smaller noise 7X3 = X1 + X2 + np.random.normal(0, 0.05, n_samples) # Even higher correlation with smaller noise 8X4 = X1 + 2*X2 + 0.5*X3 + np.random.normal(0, 0.05, n_samples) 9X5 = X2 + 3*X3 - 0.5*X4 + np.random.normal(0, 0.05, n_samples) 10X = np.vstack([X1, X2, X3, X4, X5]).T 11 12# Step 2: Generate a target variable with more noise 13y = 3 * X1 + 5 * X2 + np.random.normal(0, 1.0, n_samples) # Increased noise in y 14 15# Convert to DataFrame for easier display (optional) 16df = pd.DataFrame(X, columns=['X1', 'X2', 'X3', 'X4', 'X5']) 17df['y'] = y
Features are the linear combinations of other features, which means the data is multicollinear.
Now, let's compare the result of the Ridge Regression and the Linear Regression:
Python1X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 2 3# Linear Regression 4lr = LinearRegression() 5lr.fit(X_train, y_train) 6y_pred_lr = lr.predict(X_test) 7mse_lr = mean_squared_error(y_test, y_pred_lr) 8print(f"Linear Regression MSE: {mse_lr:.4f}") 9 10# Ridge Regression 11ridge = Ridge(alpha=1.5) 12ridge.fit(X_train, y_train) 13y_pred_ridge = ridge.predict(X_test) 14mse_ridge = mean_squared_error(y_test, y_pred_ridge) 15print(f"Ridge Regression MSE: {mse_ridge:.4f}")
Here, we train both Ridge
and LinearRegression
models on the generated data and print their MSE scores. Here is the result:
1Linear Regression MSE: 1.1271 2Ridge Regression MSE: 1.0578
As you can see, in this case Ridge Regression outperforms the regular linear regression.
In this lesson, we learned about Ridge Regression—a special type of linear regression that helps prevent overfitting by adding a regularization term.
We walked through the steps to:
Next, we’ll move to the practice section where you'll get hands-on experience implementing Ridge Regression on your own.