Evaluating Your Model's Performance

Lesson 4

Lesson Introduction

Welcome! Today, we're learning to evaluate your machine learning model's performance. Evaluating your model is crucial because it tells you how well it will predict new data it hasn't seen before. In simpler terms, it tells you if your model is good at its job.

We will focus on one metric – Mean Squared Error (MSE). This metric is like a report card for your model, showing its prediction accuracy. By the end of this lesson, you'll know how to calculate it and understand what it mean.s

Setting the Stage

Let's review what we've done so far. We have been working with synthetic data representing house areas and their prices. We used this data to train a simple linear regression model to predict house prices based on their area. Here's a reminder snippet:

Python
1import numpy as np
2import pandas as pd
3import matplotlib.pyplot as plt
4from sklearn.linear_model import LinearRegression
5
6# Generate synthetic data
7np.random.seed(42)
8num_samples = 100
9area = np.random.uniform(500, 3500, num_samples)  # House area in square feet
10base_price = 50000
11price_per_sqft = 200
12noise = np.random.normal(0, 25000, num_samples)  # Adding some noise
13price = base_price + (area * price_per_sqft) + noise
14
15# Create DataFrame
16df = pd.DataFrame({'Area': area, 'Price': price})
17
18# Extract features and target variable
19X = df['Area'].values.reshape(-1, 1)
20y = df['Price'].values
21
22# Initialize and train the model
23model = LinearRegression()
24model.fit(X, y)

With our model trained, we can evaluate its performance.

Evaluating the Model with Mean Squared Error (MSE)

Mean Squared Error (MSE) measures how far off our model's predictions are from the actual values. It’s like checking how precise your aim is in darts. The lower the MSE, the better.

Steps to calculate MSE:

Make predictions using your model.
Calculate the difference between actual and predicted prices for each house.
Square these differences.
Find the average of these squared differences.

Mathematically, it looks like this: $\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2$ where:

$N$ is the number of data points,
$y_i$ is the actual value for the $i$ -th data point,
$\hat{y}_i$ is the predicted value for the $i$ -th data point.

Let's visualize it with a plot:

Here, green vertical lines show distance between the actual price and the model's prediction. If we square all these distances and find their average, it will be the MSE metric!

Calculating MSE

Here's the code:

Python
1from sklearn.metrics import mean_squared_error
2
3# Make predictions
4y_train_predict = model.predict(X)
5
6# Calculate MSE
7mse = mean_squared_error(y, y_train_predict)
8
9print(f"Mean Squared Error: {mse:.2f}")

This will output something like:


1Mean Squared Error: 504115352.48

In real life, MSE helps you understand if your model's predictions are close to the actual prices. For example, if predicting toy prices, an MSE of 1000 means predictions are off by about 31.62 (since $\sqrt{1000} \approx 31.62$ ).

Understanding the Results

So, what do our MSE score tell us? Lower values are better. If your MSE is high, it means your model’s predictions are not accurate. For example, while predicting toy prices, an MSE of 10 might be great, but an MSE of 1000 means the model is often very wrong.

Understanding these metrics helps you improve your model. Suppose your MSE score is high; to improve your predictions, you might need to consider other features or preprocess data more accurately or even choose a different model!

Lesson Summary

Today, you learned to evaluate your model's performance using Mean Squared Error (MSE). MSE measures how close your model’s predictions are to actual values.

Understanding these metrics helps you assess and improve your model. Next, we'll practice using MSE Score to evaluate models on CodeSignal!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.