Welcome back! Today, we are delving into the practical application of predictive modeling. This lesson will focus on applying Linear Regression using the California Housing Dataset and Python, to make predictions with real-world data. This time we will be utilizing the powerful sklearn
library to simplify our process, instead of implementing linear regression from scratch, this will allow us to efficiently calculate coefficients, plot data and regression lines. So, without further ado, let's dive in!
Our journey begins by loading the California Housing Dataset. Here's how we do it with Python:
Python1from sklearn.datasets import fetch_california_housing 2import matplotlib.pyplot as plt 3 4# Fetching the dataset 5housing = fetch_california_housing() 6# Selecting the Median Income feature, index 0 represents Median Income 7X = housing.data[:, 0] # Extracting the first column 8X = X.reshape(-1, 1) # Reshape for a single feature (sklearn expects 2D array) 9Y = housing.target
In this preparation phase, we import necessary libraries and the dataset using fetch_california_housing
. The feature we're focusing on is "Median Income", which is the first column (index 0) of the dataset. Initially, we select the Median Income feature directly from the dataset. In the next step, we reshape it to ensure compatibility with sklearn
. This process of selecting and reshaping our independent variable, Median Income, prepares our data for the model fitting process.
With our data prepared, we can now fit our linear regression model:
Python1from sklearn.linear_model import LinearRegression 2 3# Creating and training the model 4model = LinearRegression() 5model.fit(X, Y)
Here, we initialize the LinearRegression
model and fit it to our data using model.fit(X, Y)
. This function trains the model by finding the best coefficients that predict our target values from the given features. It does this through an optimization process, minimizing the error between actual and predicted values. Essentially, model.fit
enables us to automate the complex steps of learning from data, allowing sklearn
to handle the underlying mathematics. This makes fitting the model both accessible and efficient, readying it for predictions without manual intervention.
Let's visualize how well our model fits the data:
Python1# Predictions for the dataset 2Y_pred = model.predict(X) 3 4# Plotting actual data points 5plt.scatter(X, Y, color='blue') 6 7# Plotting the regression line 8plt.plot(X, Y_pred, color='red') 9 10plt.title("Linear Regression on California Housing Dataset") 11plt.xlabel('Median Income') 12plt.ylabel('Median House Value') 13plt.grid(True) 14plt.show()
This code helps in visualizing the actual entries from our dataset as blue dots, while the regression line derived from our model is illustrated in red. The model.predict(X)
function plays a pivotal role here; it applies the linear regression formula with the learned coefficients during the fitting to the X values (median income). This step translates our trained model's understanding into predictions for Y values (median house value), allowing us to see how the model applies its learned linear relationship to input data.
Visualizing this linear regression line alongside actual data points provides clear insight into the correlation between median income and house values, showcasing how effectively our model captures this relationship.
Now, we demonstrate the power of our model:
Python1# Note: sklearn requires input to be a 2D array, thus we convert our single value to 2D using double brackets 2x_new = [[8]] # Median income normalized, representing $80000.00 3y_new_pred = model.predict(x_new) 4print(f"For a median income of ${x_new[0][0] * 10000:.2f}, the projected median house value is ${y_new_pred[0] * 100000:.2f}") 5# Output: For a median income of $80000.00, the projected median house value is $379436.37
By applying our model to predict the median house value for a specific median income, we observe the practical utility of Linear Regression. This showcases how median income levels can affect housing price predictions, providing us with valuable insights into housing market dynamics.
It's important to note that forming a model is just a piece of the puzzle. Evaluating its performance is crucial. We shall delve into evaluation techniques in our upcoming lessons to ensure our model's predictions are not only accurate but also reliable for real-world application.
Kudos! We've navigated through the steps of loading data, fitting a Linear Regression model with sklearn
, visualizing this model against our data, and making predictive insights on the California Housing Dataset. This lesson marks an important milestone in our journey through predictive modeling, arming us with the knowledge to implement, interpret, and assess linear regression models in practice. Stay tuned for more engaging sessions ahead!