Polynomial Regression

Lesson 2

Lesson Introduction

Hello! Today, we're diving into Polynomial Regression, an advanced form of regression analysis for modeling complex relationships between variables. We'll learn how to use Python and Scikit-Learn to perform polynomial regression. By the end, you'll know how to create polynomial features, train a model, and make predictions.

Polynomial regression is useful for capturing non-linear relationships. For instance, predicting exam scores (the target) based on study hours (the feature) might not follow a simple linear pattern. Polynomial regression can help in such cases.

Understanding Polynomial Features

Why do we need polynomial features? To fit a curve instead of a straight line, we create new features that include polynomial terms (like $x^2$ , $x^3$ ). This helps in modeling more complex relationships.

Scikit-Learn offers PolynomialFeatures to transform our input data. Here's how it works:

Python
1import numpy as np
2from sklearn.preprocessing import PolynomialFeatures
3
4X = np.array([[2], [3], [4]])
5print("Original X:\n", X)
6# Output:
7# Original X:
8# [[2]
9#  [3]
10#  [4]]
11
12# Transforming to include polynomial terms up to degree 2
13poly = PolynomialFeatures(degree=2)
14X_poly = poly.fit_transform(X)
15print("Transformed X (with polynomial terms):\n", X_poly)
16# Output:
17# Transformed X (with polynomial terms):
18# [[ 1.  2.  4.]
19#  [ 1.  3.  9.]
20#  [ 1.  4. 16.]]

The new X_poly includes the original term, its square, and an intercept term (the first column).

Loading and Preparing Data

We'll create data to work with. We'll generate random values between -1 and 1 as features, and our target variable will follow a quadratic equation $y = 3x^2 + 2x + \text{noise}$ , simulating realistic data with some noise.

Python
1import numpy as np
2
3# Load sample dataset
4np.random.seed(42)  # For reproducible results
5X = np.random.rand(100, 1) * 2 - 1  # Random values between -1 and 1
6y = 3 * X**2 + 2 * X + np.random.randn(100, 1) * 0.1  # Quadratic function with noise
7
8# Display the first 5 values of X and y
9print("First 5 rows of feature X:\n", X[:5])
10# [[-0.250919, 0.90142, 0.463988, 0.1973, -0.688]]
11print("First 5 rows of target y:\n", y[:5])
12# [[-0.304, 4.21067, 1.583, 0.31268, 0.02198]]

Now, we have the data where our target variable has a non-linear relationship with the feature.

Splitting Data into Training and Test Sets

As always, we'll split our data into training and test sets to train and evaluate our model. We will use X_train to train the model and X_test to evaluate its peformance.

Python
1from sklearn.model_selection import train_test_split
2
3# Split the data
4X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
5
6# Display the shapes of the train/test sets
7print("X_train shape:", X_train.shape, "y_train shape:", y_train.shape)
8print("X_test shape:", X_test.shape, "y_test shape:", y_test.shape)
9# Output:
10# X_train shape: (80, 1) y_train shape: (80, 1)
11# X_test shape: (20, 1) y_test shape: (20, 1)

Training a Simple Linear Regression Model

First, we'll train a simple linear regression model without polynomial features, like we did in the first lesson.

Python
1from sklearn.linear_model import LinearRegression
2from sklearn.metrics import mean_squared_error
3
4# Train a simple linear regression model
5linear_model = LinearRegression()
6linear_model.fit(X_train, y_train)
7
8# Make predictions
9y_pred_linear = linear_model.predict(X_test)
10
11# Calculate the mean squared error
12mse_linear = mean_squared_error(y_test, y_pred_linear)
13print(f"Linear Regression MSE: {mse_linear}")
14# Output
15# Linear Regression MSE: 0.7138921735032644

Now, we have the MSE score for a regular linear regression model. There is not much to say about it, but we can use it to compare this model to others. Let's train a smarter polynomial regression model and check if it works better.

Transforming Features and Training a Polynomial Regression Model

Next, we'll transform the input data to include polynomial terms and train a polynomial regression model.

Python
1from sklearn.preprocessing import PolynomialFeatures
2
3# Transforming the features into polynomial features
4poly_features = PolynomialFeatures(degree=2)
5X_train_poly = poly_features.fit_transform(X_train)
6X_test_poly = poly_features.transform(X_test)
7
8# Training a polynomial regression model
9poly_model = LinearRegression()
10poly_model.fit(X_train_poly, y_train)
11
12# Make predictions
13y_pred_poly = poly_model.predict(X_test_poly)
14
15# Calculate the mean squared error
16mse_poly = mean_squared_error(y_test, y_pred_poly)
17print(f"Polynomial Regression MSE: {mse_poly}")
18# Output
19# Polynomial Regression MSE: 0.006358406072820809

By applying PolynominalFeatures(degree=2)().fit_transform() to our data (both X_train and X_test), we create a new feature that models a quadratic relationship.

Having trained both models, we can now compare their performance using the mean squared error (MSE).

Python
1# Linear Regression MSE: 0.7138921735032644
2# Polynomial Regression MSE: 0.006358406072820809

The polynomial regression model has a much lower MSE, indicating it fits the data much better.

Lesson Summary

Great job! We covered polynomial regression, from creating polynomial features to training a model and making predictions. Here’s a quick recap:

Polynomial Features: We used PolynomialFeatures to transform our features.
Sample Data: We created a sample dataset using a quadratic formula with noise.
Train/Test Split: We split the data into training and test sets.
Model Training: We trained both a simple linear regression model and a polynomial regression model.
Evaluation: We compared their performance using MSE.

Next, you'll move to practice, where you'll apply what you've learned. You'll generate your own polynomial features, train models, and make predictions.

Happy coding!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.