In today's lesson, we are embarking on an exploration of the powerful statistical model known as Linear Regression. While venturing into the realms of machine learning, we will dissect the foundations of linear regression, implement it using the library sklearn
, and apply it to the Iris dataset. We will then visualize these fascinating results using matplotlib
. Keep your explorer spirit high, for by the end of this journey, you'll be well versed in understanding, implementing, and interpreting Linear Regression using Python and sklearn.
Linear Regression is a pillar of many machine learning algorithms, and hence, understanding it acts as a stepping stone towards grasping more complex statistical approaches. Get ready to turn the wheels of your mind and dive right in!
Linear Regression is a prime tool in a statistician's toolbox, intended to decipher the relationship between two or more variables. To break it down into simpler terms, imagine you are an explorer observing a sunrise. You realize that, the higher the sun rises in the sky, the brighter it gets. This scenario is a simple example of linear regression, where the height of the sun (an independent variable) and the brightness (a dependent variable) share a linear relationship.
Though it is a powerful tool, it does not conform to every scenario in real life, posing a limitation. Unlike brightness being a linear aspect of a sunrise, improvements in an athlete's performance do not strictly depend on training hours. Factors like nutrition, rest, and mindset also contribute substantially. Despite its limitations in some scenarios, Linear Regression, given its simplicity and efficiency, is widely used in areas like economics, computer science, and business.
Meet sklearn
, a highly efficient Python library that provides robust tools for machine learning and modeling, including Linear Regression. The sklearn.linear_model.LinearRegression
class comes with a multitude of methods like fit()
for training the model on data, predict()
for making predictions, and many more. Let's apply it to the Iris dataset.
In this example, we will be taking the sepal length as an independent variable (X
) and using it to predict the sepal width (the dependent variable, y
). Here's how we can do it:
Python1from sklearn.datasets import load_iris 2from sklearn.linear_model import LinearRegression 3 4# Loading the Iris dataset 5iris_data = load_iris() 6X = iris_data.data[:, :1] # Sepal length 7y = iris_data.data[:, 1:2] # Sepal width 8 9# Creating an instance of Linear Regression model 10lr_model = LinearRegression() 11 12# Fitting the model to our data 13lr_model.fit(X, y)
The fit()
function trains our model on the data points X
and y
. From a bird's-eye view, the function tries to draw a line that represents all the points as accurately as possible. Voila! We have now trained a Linear Regression model.
With our model trained, we can explore some of the attributes of the model, like the coefficients and the intercept, which are the building blocks of the equation . Here m
is the slope (coefficients), c
is the y-intercept, x
is the independent variable (sepal length), and y
is the dependent variable (predicted sepal width).
Let's find out the coefficients and intercept of our model:
Python1# Printing coefficients and intercept 2print('Coefficient (Slope): ', lr_model.coef_) 3print('Intercept (Y-intercept): ', lr_model.intercept_)
The output of the above code will be:
1Coefficient (Slope): [[-0.22336106]] 2Intercept (Y-intercept): [3.41894684]
Furthermore, we can utilize our trained model to predict the sepal width for new sepal length observations:
Python1# Sample sepal lengths 2new_sepal_length_values = [[4.5], [5.5], [6.5]] 3 4# Printing the predicted sepal widths 5predicted_sepal_width_values = lr_model.predict(new_sepal_length_values) 6print('Predicted Sepal Width values: ', predicted_sepal_width_values)
The output of the above code will be:
1Predicted Sepal Width values: [[2.4171761 ] 2 [2.189183 ] 3 [1.9611899 ]]
Now, the machine you've built can predict sepal width given its length. Exciting, isn't it?
Visual aids play an essential role in understanding and interpreting data. Python's Matplotlib
library is a potent tool for brewing striking visuals, helping us derive meaningful insights from our data. With Matplotlib, we will plot our data and the regression line on a 2D graph, cultivating a richer understanding of our model.
Python1import matplotlib.pyplot as plt 2 3# Plotting actual data points 4plt.scatter(X, y, color='red') 5 6# Plotting the regression line 7plt.plot(X, lr_model.predict(X), color='blue') 8 9# Setting labels and title 10plt.xlabel('Sepal length') 11plt.ylabel('Sepal width') 12plt.title('Sepal length vs Sepal width (Linear Regression)') 13 14# Displaying our plot 15plt.show()
After running the code above, we see a plot with a regression line going through the data points, providing a clearer understanding of the linear relationship between sepal length and sepal width.
Notice how the formula for the linear regression line can be used to calculate the predicted sepal width:
Python1sepal_width = 4.5 #x value 2m = lr_model.coef_ 3b = lr_model.intercept_ 4predicted_speal_length = m * sepal_width + b #y value 5print(predicted_sepal_length) # 2.4171761
This formula predicts a sepal length of 2.4171761
, just like the .predict
method.
A round of applause for you! Today, you've built your first Linear Regression model. You've touched upon and explored the nuts and bolts of Linear Regression with the support of sklearn. You applied the model to the Iris dataset, and the resulting graphs brought a moment of clarity and understanding to your learning path.
Remember, the real magic happens when concepts meet action. The upcoming exercises are designed to give you hands-on experience with what you've just learned. Put on your data detective hat and get on with your journey to decode Linear Regression puzzles. All the best for your exploration!