Hello! Today, we're diving into Polynomial Regression, an advanced form of regression analysis for modeling complex relationships between variables. We'll learn how to use Python and Scikit-Learn
to perform polynomial regression. By the end, you'll know how to create polynomial features, train a model, and make predictions.
Polynomial regression is useful for capturing non-linear relationships. For instance, predicting exam scores (the target) based on study hours (the feature) might not follow a simple linear pattern. Polynomial regression can help in such cases.
Why do we need polynomial features? To fit a curve instead of a straight line, we create new features that include polynomial terms (like , ). This helps in modeling more complex relationships.
Scikit-Learn
offers PolynomialFeatures
to transform our input data. Here's how it works:
Python1import numpy as np 2from sklearn.preprocessing import PolynomialFeatures 3 4X = np.array([[2], [3], [4]]) 5print("Original X:\n", X) 6# Output: 7# Original X: 8# [[2] 9# [3] 10# [4]] 11 12# Transforming to include polynomial terms up to degree 2 13poly = PolynomialFeatures(degree=2) 14X_poly = poly.fit_transform(X) 15print("Transformed X (with polynomial terms):\n", X_poly) 16# Output: 17# Transformed X (with polynomial terms): 18# [[ 1. 2. 4.] 19# [ 1. 3. 9.] 20# [ 1. 4. 16.]]
The new X_poly
includes the original term, its square, and an intercept term (the first column).
We'll create data to work with. We'll generate random values between -1 and 1 as features, and our target variable will follow a quadratic equation , simulating realistic data with some noise.
Python1import numpy as np 2 3# Load sample dataset 4np.random.seed(42) # For reproducible results 5X = np.random.rand(100, 1) * 2 - 1 # Random values between -1 and 1 6y = 3 * X**2 + 2 * X + np.random.randn(100, 1) * 0.1 # Quadratic function with noise 7 8# Display the first 5 values of X and y 9print("First 5 rows of feature X:\n", X[:5]) 10# [[-0.250919, 0.90142, 0.463988, 0.1973, -0.688]] 11print("First 5 rows of target y:\n", y[:5]) 12# [[-0.304, 4.21067, 1.583, 0.31268, 0.02198]]
Now, we have the data where our target variable has a non-linear relationship with the feature.
As always, we'll split our data into training and test sets to train and evaluate our model. We will use X_train
to train the model and X_test
to evaluate its peformance.
Python1from sklearn.model_selection import train_test_split 2 3# Split the data 4X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 5 6# Display the shapes of the train/test sets 7print("X_train shape:", X_train.shape, "y_train shape:", y_train.shape) 8print("X_test shape:", X_test.shape, "y_test shape:", y_test.shape) 9# Output: 10# X_train shape: (80, 1) y_train shape: (80, 1) 11# X_test shape: (20, 1) y_test shape: (20, 1)
First, we'll train a simple linear regression model without polynomial features, like we did in the first lesson.
Python1from sklearn.linear_model import LinearRegression 2from sklearn.metrics import mean_squared_error 3 4# Train a simple linear regression model 5linear_model = LinearRegression() 6linear_model.fit(X_train, y_train) 7 8# Make predictions 9y_pred_linear = linear_model.predict(X_test) 10 11# Calculate the mean squared error 12mse_linear = mean_squared_error(y_test, y_pred_linear) 13print(f"Linear Regression MSE: {mse_linear}") 14# Output 15# Linear Regression MSE: 0.7138921735032644
Now, we have the MSE
score for a regular linear regression model. There is not much to say about it, but we can use it to compare this model to others. Let's train a smarter polynomial regression model and check if it works better.
Next, we'll transform the input data to include polynomial terms and train a polynomial regression model.
Python1from sklearn.preprocessing import PolynomialFeatures 2 3# Transforming the features into polynomial features 4poly_features = PolynomialFeatures(degree=2) 5X_train_poly = poly_features.fit_transform(X_train) 6X_test_poly = poly_features.transform(X_test) 7 8# Training a polynomial regression model 9poly_model = LinearRegression() 10poly_model.fit(X_train_poly, y_train) 11 12# Make predictions 13y_pred_poly = poly_model.predict(X_test_poly) 14 15# Calculate the mean squared error 16mse_poly = mean_squared_error(y_test, y_pred_poly) 17print(f"Polynomial Regression MSE: {mse_poly}") 18# Output 19# Polynomial Regression MSE: 0.006358406072820809
By applying PolynominalFeatures(degree=2)().fit_transform()
to our data (both X_train
and X_test
), we create a new feature that models a quadratic relationship.
Having trained both models, we can now compare their performance using the mean squared error (MSE).
Python1# Linear Regression MSE: 0.7138921735032644 2# Polynomial Regression MSE: 0.006358406072820809
The polynomial regression model has a much lower MSE, indicating it fits the data much better.
Great job! We covered polynomial regression, from creating polynomial features to training a model and making predictions. Here’s a quick recap:
- Polynomial Features: We used
PolynomialFeatures
to transform our features. - Sample Data: We created a sample dataset using a quadratic formula with noise.
- Train/Test Split: We split the data into training and test sets.
- Model Training: We trained both a simple linear regression model and a polynomial regression model.
- Evaluation: We compared their performance using MSE.
Next, you'll move to practice, where you'll apply what you've learned. You'll generate your own polynomial features, train models, and make predictions.
Happy coding!