Lesson 2

Hello! Today, we're diving into **Polynomial Regression**, an advanced form of regression analysis for modeling complex relationships between variables. We'll learn how to use Python and `Scikit-Learn`

to perform polynomial regression. By the end, you'll know how to create polynomial features, train a model, and make predictions.

*Polynomial regression* is useful for capturing non-linear relationships. For instance, predicting exam scores (the target) based on study hours (the feature) might not follow a simple linear pattern. Polynomial regression can help in such cases.

Why do we need polynomial features? To fit a curve instead of a straight line, we create new features that include polynomial terms (like $x^2$, $x^3$). This helps in modeling more complex relationships.

`Scikit-Learn`

offers `PolynomialFeatures`

to transform our input data. Here's how it works:

Python`1import numpy as np 2from sklearn.preprocessing import PolynomialFeatures 3 4X = np.array([[2], [3], [4]]) 5print("Original X:\n", X) 6# Output: 7# Original X: 8# [[2] 9# [3] 10# [4]] 11 12# Transforming to include polynomial terms up to degree 2 13poly = PolynomialFeatures(degree=2) 14X_poly = poly.fit_transform(X) 15print("Transformed X (with polynomial terms):\n", X_poly) 16# Output: 17# Transformed X (with polynomial terms): 18# [[ 1. 2. 4.] 19# [ 1. 3. 9.] 20# [ 1. 4. 16.]]`

The new `X_poly`

includes the original term, its square, and an intercept term (the first column).

We'll create data to work with. We'll generate random values between -1 and 1 as features, and our target variable will follow a quadratic equation $y = 3x^2 + 2x + \text{noise}$, simulating realistic data with some noise.

Python`1import numpy as np 2 3# Load sample dataset 4np.random.seed(42) # For reproducible results 5X = np.random.rand(100, 1) * 2 - 1 # Random values between -1 and 1 6y = 3 * X**2 + 2 * X + np.random.randn(100, 1) * 0.1 # Quadratic function with noise 7 8# Display the first 5 values of X and y 9print("First 5 rows of feature X:\n", X[:5]) 10# [[-0.250919, 0.90142, 0.463988, 0.1973, -0.688]] 11print("First 5 rows of target y:\n", y[:5]) 12# [[-0.304, 4.21067, 1.583, 0.31268, 0.02198]]`

Now, we have the data where our target variable has a non-linear relationship with the feature.

As always, we'll split our data into training and test sets to train and evaluate our model. We will use `X_train`

to train the model and `X_test`

to evaluate its peformance.

Python`1from sklearn.model_selection import train_test_split 2 3# Split the data 4X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 5 6# Display the shapes of the train/test sets 7print("X_train shape:", X_train.shape, "y_train shape:", y_train.shape) 8print("X_test shape:", X_test.shape, "y_test shape:", y_test.shape) 9# Output: 10# X_train shape: (80, 1) y_train shape: (80, 1) 11# X_test shape: (20, 1) y_test shape: (20, 1)`

First, we'll train a simple linear regression model without polynomial features, like we did in the first lesson.

Python`1from sklearn.linear_model import LinearRegression 2from sklearn.metrics import mean_squared_error 3 4# Train a simple linear regression model 5linear_model = LinearRegression() 6linear_model.fit(X_train, y_train) 7 8# Make predictions 9y_pred_linear = linear_model.predict(X_test) 10 11# Calculate the mean squared error 12mse_linear = mean_squared_error(y_test, y_pred_linear) 13print(f"Linear Regression MSE: {mse_linear}") 14# Output 15# Linear Regression MSE: 0.7138921735032644`

Now, we have the `MSE`

score for a regular linear regression model. There is not much to say about it, but we can use it to compare this model to others. Let's train a smarter **polynomial** regression model and check if it works better.

Next, we'll transform the input data to include polynomial terms and train a polynomial regression model.

Python`1from sklearn.preprocessing import PolynomialFeatures 2 3# Transforming the features into polynomial features 4poly_features = PolynomialFeatures(degree=2) 5X_train_poly = poly_features.fit_transform(X_train) 6X_test_poly = poly_features.transform(X_test) 7 8# Training a polynomial regression model 9poly_model = LinearRegression() 10poly_model.fit(X_train_poly, y_train) 11 12# Make predictions 13y_pred_poly = poly_model.predict(X_test_poly) 14 15# Calculate the mean squared error 16mse_poly = mean_squared_error(y_test, y_pred_poly) 17print(f"Polynomial Regression MSE: {mse_poly}") 18# Output 19# Polynomial Regression MSE: 0.006358406072820809`

By applying `PolynominalFeatures(degree=2)().fit_transform()`

to our data (both `X_train`

and `X_test`

), we create a new feature that models a quadratic relationship.

Having trained both models, we can now compare their performance using the mean squared error (MSE).

Python`1# Linear Regression MSE: 0.7138921735032644 2# Polynomial Regression MSE: 0.006358406072820809`

The polynomial regression model has a much lower MSE, indicating it fits the data much better.

Great job! We covered polynomial regression, from creating polynomial features to training a model and making predictions. Here’s a quick recap:

**Polynomial Features**: We used`PolynomialFeatures`

to transform our features.**Sample Data**: We created a sample dataset using a quadratic formula with noise.**Train/Test Split**: We split the data into training and test sets.**Model Training**: We trained both a simple linear regression model and a polynomial regression model.**Evaluation**: We compared their performance using MSE.

Next, you'll move to practice, where you'll apply what you've learned. You'll generate your own polynomial features, train models, and make predictions.

Happy coding!