Deep Evaluation of Model Performance

Lesson 3

Introduction

Welcome back! In this lesson, we are taking a deep dive into one of the most important aspects of machine learning: model evaluation. Evaluating your model is like getting the final results of an exam—it tells you how well all your hard work has paid off and where improvements can be made. We will learn how to assess our model's performance using TensorFlow's tools, understand the training dynamics through the history object, and visualize loss data with Matplotlib. By the end of this lesson, you'll be equipped with the skills to confidently evaluate any TensorFlow model, ensuring it performs well on unseen data. Let’s get started!

Deep Dive into Model Evaluation

Evaluation is a crucial step in a machine learning pipeline, as it tells us how well our model performs on unseen data. Performance during training doesn't guarantee real-world success, just like excelling at practice questions doesn't ensure acing the actual test.

Our model was trained using the adam optimizer and categorical_crossentropy loss function. The addition of the accuracy metric during compilation helps quickly gauge performance.

Beware of overfitting, where a model performs well on training data but poorly on test data. Overfitting occurs when a model learns the noise in the training data, negatively impacting its performance on new data. Don't worry; we'll soon learn how to detect it.

Recap: Loading Data and Training the Model

Before we dive into evaluating our model, let's quickly recap the steps we took to load our data and train the model. Here’s the code we used to preprocess our data:

Python
1import numpy as np
2from sklearn.datasets import load_iris
3from sklearn.model_selection import train_test_split
4from sklearn.preprocessing import StandardScaler, OneHotEncoder
5
6def load_preprocessed_data():
7    # Load the Iris dataset
8    iris = load_iris()
9    X, y = iris.data, iris.target
10
11    # Split the dataset into training and testing sets
12    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=42)
13
14    # Scale the features
15    scaler = StandardScaler().fit(X_train)
16    X_train_scaled = scaler.transform(X_train)
17    X_test_scaled = scaler.transform(X_test)
18
19    # One-hot encode the targets
20    encoder = OneHotEncoder(sparse_output=False).fit(y_train.reshape(-1, 1))
21    y_train_encoded = encoder.transform(y_train.reshape(-1, 1))
22    y_test_encoded = encoder.transform(y_test.reshape(-1, 1))
23
24    return X_train_scaled, X_test_scaled, y_train_encoded, y_test_encoded

Later on we implemented the following code to train our model on the preprocessed data:

Python
1import tensorflow as tf
2from data_preprocessing import load_preprocessed_data
3
4# Load preprocessed data
5X_train, X_test, y_train, y_test = load_preprocessed_data()
6
7# Define the model
8model = tf.keras.Sequential([
9    tf.keras.layers.Input(shape=(4,)),
10    tf.keras.layers.Dense(10, activation='relu'),
11    tf.keras.layers.Dense(10, activation='relu'),
12    tf.keras.layers.Dense(3, activation='softmax')
13])
14
15# Compile the model
16model.compile(optimizer='adam',
17              loss='categorical_crossentropy',
18              metrics=['accuracy'])
19
20# Train the model
21history = model.fit(X_train, y_train, epochs=150, batch_size=5, validation_data=(X_test, y_test))

In summary:

We started by loading the preprocessed data using our custom function load_preprocessed_data() from a separate file named data_preprocessing.py to obtain our training and testing datasets: X_train, X_test, y_train, and y_test.
We defined a sequential model using TensorFlow's Keras API, with input shapes matching our dataset and various dense layers featuring ReLU and Softmax activations.
The model was compiled with the Adam optimizer and categorical crossentropy loss function, and we included accuracy as a metric.
We then trained the model over 150 epochs with a batch size of 5, validating its accuracy and loss on the test data as training progressed.

Now that we are refreshed on the training steps, let's proceed to evaluate how well our model performs.

Evaluating Model with TensorFlow's evaluate Function

We evaluate a model in TensorFlow using the evaluate method of the model. It's super handy and easy to use.

Here's how we can do so:

Python
1# Evaluate on test data
2test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)

In the above example, model.evaluate(X_test, y_test) computes the loss based on our test data (X_test, y_test), using the loss function specified in model.compile(). In our case, this is the categorical_crossentropy loss. The function returns the loss and any metric defined in model.compile(). We recorded the returned values into test_loss and test_accuracy.

This method gives us a quick overview of how our model performs on unseen data. Remember: a lower loss and a higher accuracy indicate better performance.

Python
1print(f'Test Accuracy: {test_accuracy}, Test Loss: {test_loss}')

The output of the above code will be:

Plain text
1Test Accuracy: 0.9111111164093018, Test Loss: 0.151137113571167

This shows how our model performed on the test dataset. Despite being a simple example, observing the test accuracy and loss helps us gauge the model's effectiveness in dealing with new, unseen data.

Understanding Training History

Do you remember the fit method we used to train our model? This method returns a history object, which has a history attribute. This attribute is a dictionary holding the running loss and metrics values. Simply put, it holds the performance of our model at each epoch during training.

Let's extract the training and validation loss from the history object:

Python
1# Extract loss data from the history object
2training_loss = history.history['loss']
3validation_loss = history.history['val_loss']

Here, training_loss holds the loss of the model at each epoch during training, and validation_loss holds the loss of the model at each epoch on the validation set.

These values are vital for us to visualize the performance of our model over time, which significantly helps in identifying where the model might be underfitting or overfitting.

Visualizing Loss Data with Matplotlib

Matplotlib is a powerful data visualization library in Python. It allows us to plot our data in various ways to better understand it.

In the following code, we plot the training and validation loss across epochs:

Python
1# Plotting actual training and validation loss
2epochs = range(1, 151)
3plt.figure(figsize=(8, 5))
4plt.plot(epochs, training_loss, label='Training Loss')
5plt.plot(epochs, validation_loss, label='Validation Loss')
6plt.title('Model Loss During Training')
7plt.ylabel('Loss')
8plt.xlabel('Epoch')
9plt.legend()
10plt.show()

The output of the above code should look similar to this:

This plot from our model's training shows a rapid initial decrease in both training and validation losses, indicating effective learning and generalization. As the curves flatten around the 20th epoch, it suggests that the model is approaching convergence, effectively learning the underlying patterns rather than just memorizing the training data. The close proximity and parallel movement of the training and validation loss curves throughout the training process indicate good generalization. However, the slight divergence between the training and validation losses as training progresses suggests a beginning trend towards overfitting, where the model might start to learn the noise and fine details specific to the training data, potentially limiting its performance on new, unseen data. Given this potential for overfitting and the stability of the loss from around the 50th epoch onward, implementing early stopping in future training could optimize efficiency and prevent overtraining.

Lesson Summary and Practice

And there you have it! You now know how to evaluate a TensorFlow model, which includes using the model's evaluate function, understanding the history object to extract valuable metrics, and finally, visualizing the performance using Matplotlib.

Up next, we have a few exercises for you. These exercises let you put what you've learned into practice, such as evaluating models and analyzing their performance. By the end of these exercises, you should feel confident and ready to evaluate any TensorFlow model you'll be working with in the future.

Starting these exercises as soon as you can will also help you get the best results from this lesson, so let's jump in and start coding!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.